You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by Tyler Palsulich <tp...@apache.org> on 2015/01/06 07:59:43 UTC

[VOTE] Apache Tika 1.7 Release

Hi All,

A candidate for the Tika 1.7 release is available at:
    https://dist.apache.org/repos/dist/dev/tika/

The release candidate is a zip archive of the sources in:
    http://svn.apache.org/repos/asf/tika/tags/1.7-rc2/

The SHA1 checksum of the archive is
    0307a8367ae6f8b1103824fd11337fd89e24e6a4.

In addition, a staged maven repository is available here:

https://repository.apache.org/content/repositories/orgapachetika-1006/org/apache/tika/

Please vote on releasing this package as Apache Tika 1.7.

The vote is open for the next 72 hours and passes if a majority of at least
three +1 Tika PMC votes are cast.

    [ ] +1 Release this package as Apache Tika 1.7
    [ ] -1 Do not release this package because...

Thanks!
Tyler

P.S. Count this as my +1!

Re: [VOTE] Apache Tika 1.7 Release

Posted by Thomas Ledoux <tl...@gmail.com>.
+1

2015-01-06 7:59 GMT+01:00 Tyler Palsulich <tp...@apache.org>:

> Hi All,
>
> A candidate for the Tika 1.7 release is available at:
>     https://dist.apache.org/repos/dist/dev/tika/
>
> The release candidate is a zip archive of the sources in:
>     http://svn.apache.org/repos/asf/tika/tags/1.7-rc2/
>
> The SHA1 checksum of the archive is
>     0307a8367ae6f8b1103824fd11337fd89e24e6a4.
>
> In addition, a staged maven repository is available here:
>
> https://repository.apache.org/content/repositories/orgapachetika-1006/org/apache/tika/
>
> Please vote on releasing this package as Apache Tika 1.7.
>
> The vote is open for the next 72 hours and passes if a majority of at
> least three +1 Tika PMC votes are cast.
>
>     [ ] +1 Release this package as Apache Tika 1.7
>     [ ] -1 Do not release this package because...
>
> Thanks!
> Tyler
>
> P.S. Count this as my +1!
>

Re: [VOTE] Apache Tika 1.7 Release

Posted by Thomas Ledoux <tl...@gmail.com>.
+1, works for me

2015-01-13 9:23 GMT+01:00 Tyler Palsulich <tp...@gmail.com>:

> Hi Folks,
>
> Let's mark this RC#2 as failed and shift the vote to the updated RC#3 (
> http://markmail.org/message/m5gpgmr7hedgpjdj), which has Tesseract
> metadata
> fixes and David's test fix.
>
> Thanks,
> Tyler
>
> On Thu, Jan 8, 2015 at 6:25 AM, Peter Bowyer <pe...@mapledesign.co.uk>
> wrote:
>
> > +1.
> >
> > Worked great once I manually
> > edited
> >
> tika-parsers/src/main/resources/org/apache/tika/parser/pdf/PDFParser.properties
> > and set useNonSequentialParser to true
> >
> > Peter
> >
>

Re: [VOTE] Apache Tika 1.7 Release

Posted by Lewis John Mcgibbney <le...@gmail.com>.
+1

On Tue, Jan 13, 2015 at 3:23 AM, Tyler Palsulich <tp...@gmail.com>
wrote:

> Hi Folks,
>
> Let's mark this RC#2 as failed and shift the vote to the updated RC#3 (
> http://markmail.org/message/m5gpgmr7hedgpjdj), which has Tesseract
> metadata fixes and David's test fix.
>
> Thanks,
> Tyler
>
> On Thu, Jan 8, 2015 at 6:25 AM, Peter Bowyer <pe...@mapledesign.co.uk>
> wrote:
>
>> +1.
>>
>> Worked great once I manually
>> edited
>> tika-parsers/src/main/resources/org/apache/tika/parser/pdf/PDFParser.properties
>> and set useNonSequentialParser to true
>>
>> Peter
>>
>
>


-- 
*Lewis*

Re: [VOTE] Apache Tika 1.7 Release

Posted by Lewis John Mcgibbney <le...@gmail.com>.
+1

On Tue, Jan 13, 2015 at 3:23 AM, Tyler Palsulich <tp...@gmail.com>
wrote:

> Hi Folks,
>
> Let's mark this RC#2 as failed and shift the vote to the updated RC#3 (
> http://markmail.org/message/m5gpgmr7hedgpjdj), which has Tesseract
> metadata fixes and David's test fix.
>
> Thanks,
> Tyler
>
> On Thu, Jan 8, 2015 at 6:25 AM, Peter Bowyer <pe...@mapledesign.co.uk>
> wrote:
>
>> +1.
>>
>> Worked great once I manually
>> edited
>> tika-parsers/src/main/resources/org/apache/tika/parser/pdf/PDFParser.properties
>> and set useNonSequentialParser to true
>>
>> Peter
>>
>
>


-- 
*Lewis*

Re: [VOTE] Apache Tika 1.7 Release

Posted by Tyler Palsulich <tp...@gmail.com>.
Hi Folks,

Let's mark this RC#2 as failed and shift the vote to the updated RC#3 (
http://markmail.org/message/m5gpgmr7hedgpjdj), which has Tesseract metadata
fixes and David's test fix.

Thanks,
Tyler

On Thu, Jan 8, 2015 at 6:25 AM, Peter Bowyer <pe...@mapledesign.co.uk>
wrote:

> +1.
>
> Worked great once I manually
> edited
> tika-parsers/src/main/resources/org/apache/tika/parser/pdf/PDFParser.properties
> and set useNonSequentialParser to true
>
> Peter
>

Re: [VOTE] Apache Tika 1.7 Release

Posted by Tyler Palsulich <tp...@gmail.com>.
Hi Folks,

Let's mark this RC#2 as failed and shift the vote to the updated RC#3 (
http://markmail.org/message/m5gpgmr7hedgpjdj), which has Tesseract metadata
fixes and David's test fix.

Thanks,
Tyler

On Thu, Jan 8, 2015 at 6:25 AM, Peter Bowyer <pe...@mapledesign.co.uk>
wrote:

> +1.
>
> Worked great once I manually
> edited
> tika-parsers/src/main/resources/org/apache/tika/parser/pdf/PDFParser.properties
> and set useNonSequentialParser to true
>
> Peter
>

Re: [VOTE] Apache Tika 1.7 Release

Posted by Peter Bowyer <pe...@mapledesign.co.uk>.
+1.

Worked great once I manually
edited tika-parsers/src/main/resources/org/apache/tika/parser/pdf/PDFParser.properties
and set useNonSequentialParser to true

Peter

RE: [VOTE] Apache Tika 1.7 Release

Posted by Markus Jelsma <ma...@openindex.io>.
+1

 
 
-----Original message-----
> From:Sergey Beryozkin <sb...@gmail.com>
> Sent: Tuesday 6th January 2015 9:36
> To: user@tika.apache.org
> Subject: Re: [VOTE] Apache Tika 1.7 Release
> 
> +1
> Sergey
> On 06/01/15 09:59, Tyler Palsulich wrote:
> > Hi All,
> >
> > A candidate for the Tika 1.7 release is available at:
> > https://dist.apache.org/repos/dist/dev/tika/
> >
> > The release candidate is a zip archive of the sources in:
> > http://svn.apache.org/repos/asf/tika/tags/1.7-rc2/
> >
> > The SHA1 checksum of the archive is
> >      0307a8367ae6f8b1103824fd11337fd89e24e6a4.
> >
> > In addition, a staged maven repository is available here:
> > https://repository.apache.org/content/repositories/orgapachetika-1006/org/apache/tika/
> >
> > Please vote on releasing this package as Apache Tika 1.7.
> >
> > The vote is open for the next 72 hours and passes if a majority of at
> > least three +1 Tika PMC votes are cast.
> >
> >      [ ] +1 Release this package as Apache Tika 1.7
> >      [ ] -1 Do not release this package because...
> >
> > Thanks!
> > Tyler
> >
> > P.S. Count this as my +1!
> 
> 

Re: [VOTE] Apache Tika 1.7 Release

Posted by Sergey Beryozkin <sb...@gmail.com>.
+1
Sergey
On 06/01/15 09:59, Tyler Palsulich wrote:
> Hi All,
>
> A candidate for the Tika 1.7 release is available at:
> https://dist.apache.org/repos/dist/dev/tika/
>
> The release candidate is a zip archive of the sources in:
> http://svn.apache.org/repos/asf/tika/tags/1.7-rc2/
>
> The SHA1 checksum of the archive is
>      0307a8367ae6f8b1103824fd11337fd89e24e6a4.
>
> In addition, a staged maven repository is available here:
> https://repository.apache.org/content/repositories/orgapachetika-1006/org/apache/tika/
>
> Please vote on releasing this package as Apache Tika 1.7.
>
> The vote is open for the next 72 hours and passes if a majority of at
> least three +1 Tika PMC votes are cast.
>
>      [ ] +1 Release this package as Apache Tika 1.7
>      [ ] -1 Do not release this package because...
>
> Thanks!
> Tyler
>
> P.S. Count this as my +1!


Re: [VOTE] Apache Tika 1.7 Release

Posted by Peter Bowyer <pe...@mapledesign.co.uk>.
+1.

Worked great once I manually
edited tika-parsers/src/main/resources/org/apache/tika/parser/pdf/PDFParser.properties
and set useNonSequentialParser to true

Peter

Re: [VOTE] Apache Tika 1.7 Release

Posted by David Meikle <lo...@gmail.com>.
-1 on this for me too as there is a small unit test failure from ODFParser
on Windows from TIKA-1412.

I have added the tweak to fix this on trunk.

(I have also tested the latest changes added by Tim and Tyler in TIKA-1445
on Windows, Mac and Ubuntu with a decent batch of files, and everything is
working nicely at this end.)

On 7 January 2015 at 01:11, Allison, Timothy B. <ta...@mitre.org> wrote:

> -1
>
> I'm sorry that I haven't had a chance to kick the tires on the recent
> changes to the metadata extraction from images until now, but it looks like
> 1.7-rc2 and trunk are not pulling metadata from embedded images.
>
> I've posted a test file from govdocs1 to TIKA-1445.  I may have time
> tomorrow to see what's going on.  I should also have time tomorrow to
> finish the analysis of the comparison between 1.6 and 1.7 on govdocs1.
>
> Sorry for my delay, all!  And even greater apologies if user error is at
> fault and metadata is successfully being extracted from embedded images. :)
>
> Thank you, Tyler, for running this release!
>
>
> -----Original Message-----
> From: Nick Burch [mailto:apache@gagravarr.org]
> Sent: Tuesday, January 06, 2015 11:36 AM
> To: dev@tika.apache.org
> Subject: Re: [VOTE] Apache Tika 1.7 Release
>
> On Tue, 6 Jan 2015, Tyler Palsulich wrote:
> > A candidate for the Tika 1.7 release is available at:
> >    https://dist.apache.org/repos/dist/dev/tika/
> >
> > The release candidate is a zip archive of the sources in:
> >    http://svn.apache.org/repos/asf/tika/tags/1.7-rc2/
> >
> > The SHA1 checksum of the archive is
> >    0307a8367ae6f8b1103824fd11337fd89e24e6a4.
> >
> > In addition, a staged maven repository is available here:
> >
> >
> https://repository.apache.org/content/repositories/orgapachetika-1006/org/apache/tika/
>
> Looks good to me, I'm +1
>
> Nick
>

RE: [VOTE] Apache Tika 1.7 Release

Posted by "Allison, Timothy B." <ta...@mitre.org>.
-1

I'm sorry that I haven't had a chance to kick the tires on the recent changes to the metadata extraction from images until now, but it looks like 1.7-rc2 and trunk are not pulling metadata from embedded images.

I've posted a test file from govdocs1 to TIKA-1445.  I may have time tomorrow to see what's going on.  I should also have time tomorrow to finish the analysis of the comparison between 1.6 and 1.7 on govdocs1.

Sorry for my delay, all!  And even greater apologies if user error is at fault and metadata is successfully being extracted from embedded images. :)

Thank you, Tyler, for running this release!


-----Original Message-----
From: Nick Burch [mailto:apache@gagravarr.org] 
Sent: Tuesday, January 06, 2015 11:36 AM
To: dev@tika.apache.org
Subject: Re: [VOTE] Apache Tika 1.7 Release

On Tue, 6 Jan 2015, Tyler Palsulich wrote:
> A candidate for the Tika 1.7 release is available at:
>    https://dist.apache.org/repos/dist/dev/tika/
>
> The release candidate is a zip archive of the sources in:
>    http://svn.apache.org/repos/asf/tika/tags/1.7-rc2/
>
> The SHA1 checksum of the archive is
>    0307a8367ae6f8b1103824fd11337fd89e24e6a4.
>
> In addition, a staged maven repository is available here:
>
> https://repository.apache.org/content/repositories/orgapachetika-1006/org/apache/tika/

Looks good to me, I'm +1

Nick

Re: [VOTE] Apache Tika 1.7 Release

Posted by Nick Burch <ap...@gagravarr.org>.
On Tue, 6 Jan 2015, Tyler Palsulich wrote:
> A candidate for the Tika 1.7 release is available at:
>    https://dist.apache.org/repos/dist/dev/tika/
>
> The release candidate is a zip archive of the sources in:
>    http://svn.apache.org/repos/asf/tika/tags/1.7-rc2/
>
> The SHA1 checksum of the archive is
>    0307a8367ae6f8b1103824fd11337fd89e24e6a4.
>
> In addition, a staged maven repository is available here:
>
> https://repository.apache.org/content/repositories/orgapachetika-1006/org/apache/tika/

Looks good to me, I'm +1

Nick

Re: [VOTE] Apache Tika 1.7 Release

Posted by Hong-Thai Nguyen <th...@gmail.com>.
Seems fine for me: +1

No big regression on our corpus test of 23K docs:

15-01-07 18:19:27 INFO  (DocumentConversionErrorPlugin.java : 116)
[pool-3-thread-1] Summary of document conversion errors:
- pdf (4)
* (2) org.apache.tika.exception.TikaException: TIKA-198: Illegal
IOException from org.apache.tika.parser.ParserDecorator$1@4b0b2006
* (1) org.apache.tika.exception.TikaException: Unexpected RuntimeException
from org.apache.tika.parser.ParserDecorator$1@4b0b2006
* (1) org.apache.tika.exception.TikaException: Unable to extract PDF content
- ps (3)
* (3) org.apache.tika.exception.TikaException: Unable to unpack document
stream
- pptx (10)
* (9) org.apache.tika.exception.TikaException: Error creating OOXML
extractor
* (1) org.apache.tika.exception.TikaException: Unexpected RuntimeException
from org.apache.tika.parser.ParserDecorator$1@45df8db8
- doc (6)
* (6) org.apache.tika.exception.TikaException: Unexpected RuntimeException
from org.apache.tika.parser.ParserDecorator$1@58797499
- ppt (14)
* (13) org.apache.tika.exception.TikaException: Unexpected RuntimeException
from org.apache.tika.parser.ParserDecorator$1@58797499
* (1) org.apache.tika.exception.TikaException: TIKA-198: Illegal
IOException from org.apache.tika.parser.ParserDecorator$1@58797499
- xls (9)
* (9) org.apache.tika.exception.TikaException: Unexpected RuntimeException
from org.apache.tika.parser.ParserDecorator$1@58797499
- vsd (3)
* (3) org.apache.tika.exception.TikaException: Unexpected RuntimeException
from org.apache.tika.parser.ParserDecorator$1@58797499
- odp (2)
* (2) org.apache.tika.exception.TikaException: TIKA-198: Illegal
IOException from org.apache.tika.parser.ParserDecorator$1@753ce4d8
- chm (1)
* (1) org.apache.tika.exception.TikaException: CHM file extract error:
extracted Length is wrong.
- dwg (4)
* (4) org.apache.tika.exception.TikaException: Unsupported AutoCAD drawing
version: AC1014
- pps (2)
* (2) org.apache.tika.exception.TikaException: Unexpected RuntimeException
from org.apache.tika.parser.ParserDecorator$1@58797499
- chw (1)
* (1) org.apache.tika.exception.TikaException: Unexpected RuntimeException
from org.apache.tika.parser.ParserDecorator$1@a0b8fca

Thank Tyler,

On Tue, Jan 6, 2015 at 7:59 AM, Tyler Palsulich <tp...@apache.org>
wrote:

> Hi All,
>
> A candidate for the Tika 1.7 release is available at:
>     https://dist.apache.org/repos/dist/dev/tika/
>
> The release candidate is a zip archive of the sources in:
>     http://svn.apache.org/repos/asf/tika/tags/1.7-rc2/
>
> The SHA1 checksum of the archive is
>     0307a8367ae6f8b1103824fd11337fd89e24e6a4.
>
> In addition, a staged maven repository is available here:
>
>
> https://repository.apache.org/content/repositories/orgapachetika-1006/org/apache/tika/
>
> Please vote on releasing this package as Apache Tika 1.7.
>
> The vote is open for the next 72 hours and passes if a majority of at least
> three +1 Tika PMC votes are cast.
>
>     [ ] +1 Release this package as Apache Tika 1.7
>     [ ] -1 Do not release this package because...
>
> Thanks!
> Tyler
>
> P.S. Count this as my +1!
>



-- 
--------------
Hong-Thai