You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by Chris Mattmann <ma...@apache.org> on 2013/07/02 08:01:18 UTC

[ANNOUNCE] Apache Tika 1.4 Released

The Apache Tika project is pleased to announce the release of Apache Tika
1.4. The release contents have been pushed out to the main Apache release
site and to the Maven Central sync, so the releases should be available as
soon as the mirrors get the syncs.

Apache Tika is a toolkit for detecting and extracting metadata and
structured text content from various documents using existing parser
libraries.

Apache Tika 1.4 contains a number of improvements and bug fixes. Details
can
be found in the changes file:

http://www.apache.org/dist/tika/CHANGES-1.4.txt

Apache Tika is available in source form from the following download page:

http://www.apache.org/dyn/closer.cgi/tika/apache-tika-1.4-src.zip

Apache Tika is also available in binary form or for use using Maven 2 from
the Central Repository:

http://repo1.maven.org/maven2/org/apache/tika/

In the initial 48 hours, the release may not be available on all mirrors.
When downloading from a mirror site, please remember to verify the
downloads
using signatures found on the Apache site:

https://people.apache.org/keys/group/tika.asc


For more information on Apache Tika, visit the project home page:

http://tika.apache.org/

-- Chris Mattmann, on behalf of the Apache Tika community



Re: [ANNOUNCE] Apache Tika 1.4 Released

Posted by Nick Burch <ap...@gagravarr.org>.
On Mon, 8 Jul 2013, samir pendharkar wrote:
> I updated to tika 1.4 and all else seems to be working fine. However, 
> RTF file parsing, in particular is failing with 
> ArrayIndexOutOfBoundException for at least 4-5 files. This was working 
> correctly in tika 1.3 release, so this seems like regression bug. Can 
> somebody throw light on this(and how to fix it) -

I'd suggest you open a new bug in JIRA, then upload one of the files 
that's triggering the problem (ideally the smallest one!)

The only RTF parser change I can spot since 1.3 is this:
------------------------------------------------------------------------
r1437752 | mikemccand | 2013-01-23 21:38:38 +0000 (Wed, 23 Jan 2013) |

TIKA-1062: parse lists from RTF documents
------------------------------------------------------------------------

Nick

Re: [ANNOUNCE] Apache Tika 1.4 Released

Posted by samir pendharkar <sa...@gmail.com>.
Hi All,

I updated to tika 1.4 and all else seems to be working fine.
However, RTF file parsing, in particular is failing with
ArrayIndexOutOfBoundException for at least 4-5 files. This was working
correctly in tika 1.3 release, so this seems like regression bug.
Can somebody throw light on this(and how to fix it) -

Following is the exception I am getting with tika 1.4 -
Caused by: java.lang.ArrayIndexOutOfBoundsException: 9
    at
org.apache.tika.parser.rtf.TextExtractor.processControlWord(TextExtractor.java:872)
    at
org.apache.tika.parser.rtf.TextExtractor.parseControlWord(TextExtractor.java:566)
    at
org.apache.tika.parser.rtf.TextExtractor.parseControlToken(TextExtractor.java:492)
    at
org.apache.tika.parser.rtf.TextExtractor.extract(TextExtractor.java:459)
    at
org.apache.tika.parser.rtf.TextExtractor.extract(TextExtractor.java:448)
    at org.apache.tika.parser.rtf.RTFParser.parse(RTFParser.java:56)
    at
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)


On Tue, Jul 2, 2013 at 11:31 AM, Chris Mattmann <ma...@apache.org> wrote:

> The Apache Tika project is pleased to announce the release of Apache Tika
> 1.4. The release contents have been pushed out to the main Apache release
> site and to the Maven Central sync, so the releases should be available as
> soon as the mirrors get the syncs.
>
> Apache Tika is a toolkit for detecting and extracting metadata and
> structured text content from various documents using existing parser
> libraries.
>
> Apache Tika 1.4 contains a number of improvements and bug fixes. Details
> can
> be found in the changes file:
>
> http://www.apache.org/dist/tika/CHANGES-1.4.txt
>
> Apache Tika is available in source form from the following download page:
>
> http://www.apache.org/dyn/closer.cgi/tika/apache-tika-1.4-src.zip
>
> Apache Tika is also available in binary form or for use using Maven 2 from
> the Central Repository:
>
> http://repo1.maven.org/maven2/org/apache/tika/
>
> In the initial 48 hours, the release may not be available on all mirrors.
> When downloading from a mirror site, please remember to verify the
> downloads
> using signatures found on the Apache site:
>
> https://people.apache.org/keys/group/tika.asc
>
>
> For more information on Apache Tika, visit the project home page:
>
> http://tika.apache.org/
>
> -- Chris Mattmann, on behalf of the Apache Tika community
>
>
>

Re: [ANNOUNCE] Apache Tika 1.4 Released

Posted by samir pendharkar <sa...@gmail.com>.
Hi All,

I updated to tika 1.4 and all else seems to be working fine.
However, RTF file parsing, in particular is failing with
ArrayIndexOutOfBoundException for at least 4-5 files. This was working
correctly in tika 1.3 release, so this seems like regression bug.
Can somebody throw light on this(and how to fix it) -

Following is the exception I am getting with tika 1.4 -
Caused by: java.lang.ArrayIndexOutOfBoundsException: 9
    at
org.apache.tika.parser.rtf.TextExtractor.processControlWord(TextExtractor.java:872)
    at
org.apache.tika.parser.rtf.TextExtractor.parseControlWord(TextExtractor.java:566)
    at
org.apache.tika.parser.rtf.TextExtractor.parseControlToken(TextExtractor.java:492)
    at
org.apache.tika.parser.rtf.TextExtractor.extract(TextExtractor.java:459)
    at
org.apache.tika.parser.rtf.TextExtractor.extract(TextExtractor.java:448)
    at org.apache.tika.parser.rtf.RTFParser.parse(RTFParser.java:56)
    at
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)


On Tue, Jul 2, 2013 at 11:31 AM, Chris Mattmann <ma...@apache.org> wrote:

> The Apache Tika project is pleased to announce the release of Apache Tika
> 1.4. The release contents have been pushed out to the main Apache release
> site and to the Maven Central sync, so the releases should be available as
> soon as the mirrors get the syncs.
>
> Apache Tika is a toolkit for detecting and extracting metadata and
> structured text content from various documents using existing parser
> libraries.
>
> Apache Tika 1.4 contains a number of improvements and bug fixes. Details
> can
> be found in the changes file:
>
> http://www.apache.org/dist/tika/CHANGES-1.4.txt
>
> Apache Tika is available in source form from the following download page:
>
> http://www.apache.org/dyn/closer.cgi/tika/apache-tika-1.4-src.zip
>
> Apache Tika is also available in binary form or for use using Maven 2 from
> the Central Repository:
>
> http://repo1.maven.org/maven2/org/apache/tika/
>
> In the initial 48 hours, the release may not be available on all mirrors.
> When downloading from a mirror site, please remember to verify the
> downloads
> using signatures found on the Apache site:
>
> https://people.apache.org/keys/group/tika.asc
>
>
> For more information on Apache Tika, visit the project home page:
>
> http://tika.apache.org/
>
> -- Chris Mattmann, on behalf of the Apache Tika community
>
>
>