You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Allison, Timothy B." <ta...@mitre.org> on 2017/11/03 14:35:52 UTC
Tika 1.17?
All,
PDFBox 2.0.8 is now integrated. I want to fix TIKA-2490 before we release 1.17. Are there other issues that are blockers or you'd like to fix before 1.17 (TIKA-2471, maybe?)?
I plan to run initial large scale regression tests shortly for rfc822 and mbox because of TIKA-2478. I'll run the full regression tests before cutting the RC, but I want to focus on those for now. Other requests?
Cheers,
Tim
RE: Tika 1.17?
Posted by "Allison, Timothy B." <ta...@mitre.org>.
Y. You're right. Thank you!
I think I've been avoiding that because there were some regressions in metadata-extractor last I looked at this. Let's hope those are gone in 2.10.1.
-----Original Message-----
From: Tyler Bui-Palsulich [mailto:tpalsulich@apache.org]
Sent: Sunday, November 12, 2017 2:54 PM
To: dev@tika.apache.org
Subject: RE: Tika 1.17?
TIKA-2486 might be worth blocking on since there is a CVE.
Tyler
On Nov 6, 2017 5:26 AM, "Allison, Timothy B." <ta...@mitre.org> wrote:
> Y. I'm happy enough to wait a few more days. I wasn't able to kick
> off the regression tests last week. Should I wait for the new parsers
> to run the regression tests?
>
> -----Original Message-----
> From: David Meikle [mailto:loompa@gmail.com]
> Sent: Friday, November 3, 2017 7:42 PM
> To: dev@tika.apache.org
> Subject: Re: Tika 1.17?
>
> Sounds good. I have a couple of new parsers I would like to slot in
> but not had a chance the last few months. Will go for it over the
> weekend, if that works for you Tim.
>
> Cheers,
> Dave
>
>
>
> On 3 November 2017 at 15:19, Mattmann, Chris A (3010) <
> chris.a.mattmann@jpl.nasa.gov> wrote:
>
> > Let’s make it so (
> >
> > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> ++++++++++++++
> > Chris Mattmann, Ph.D.
> > Principal Data Scientist, Engineering Administrative Office (3010)
> > Manager, NSF & Open Source Projects Formulation and Development
> > Offices
> > (8212)
> > NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
> > Office: 180-503E, Mailstop: 180-503
> > Email: chris.a.mattmann@nasa.gov
> > WWW: http://sunset.usc.edu/~mattmann/
> > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> ++++++++++++++
> > Director, Information Retrieval and Data Science Group (IRDS)
> > Adjunct Associate Professor, Computer Science Department University
> > of Southern California, Los Angeles, CA 90089 USA
> > WWW: http://irds.usc.edu/
> > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> ++++++++++++++
> >
> >
> >
> > On 11/3/17, 7:35 AM, "Allison, Timothy B." <ta...@mitre.org> wrote:
> >
> > All,
> >
> > PDFBox 2.0.8 is now integrated. I want to fix TIKA-2490 before
> > we release 1.17. Are there other issues that are blockers or you'd
> > like to fix before 1.17 (TIKA-2471, maybe?)?
> >
> > I plan to run initial large scale regression tests shortly for
> > rfc822 and mbox because of TIKA-2478. I'll run the full regression
> > tests before cutting the RC, but I want to focus on those for now. Other requests?
> >
> > Cheers,
> >
> > Tim
> >
> >
> >
>
Re: Tika 1.17?
Posted by Luís Filipe Nassif <lf...@gmail.com>.
Yes, Tim, I saw all these reporting artifacs, I agree they are good things.
2017-12-08 14:32 GMT-02:00 Allison, Timothy B. <ta...@mitre.org>:
> Thank you, Luís. I’ve finally had a chance to take a look. As exceptions
> go, the PPT is the most eye-opening. I don’t know how I didn’t catch
> those…ugh.
>
>
>
> There are a bunch more exceptions for zerobyte file exceptions in
> attachments, but this is a good thing, because now we can figure out if
> those are corrupt files, missing dependencies or something else…just a
> reporting artifact.
>
>
>
> There are a bunch more exceptions for emf/wmf caused by “safelyAllocate”,
> which, I think, is a good thing. After the release, I’ll want to look at
> those to see if we need improvements in emf/wmf parsing, or if we need to
> bump the maximum expected byte lengths in the calls to safelyAllocate, or
> if the files are just plain corrupt.
>
>
>
> After I fix TIKA-2483, I think I’ll be good to roll rc1 for 1.17.
>
>
>
> Anything else holding us back?
>
>
>
> *From:* Luís Filipe Nassif [mailto:lfcnassif@gmail.com]
> *Sent:* Thursday, December 7, 2017 1:18 PM
> *To:* dev@tika.apache.org; Allison, Timothy B. <ta...@mitre.org>
> *Subject:* Fwd: Tika 1.17?
>
>
>
> Oh sorry, I thought I have sent to dev list, forwarding...
>
>
>
> Luis
>
>
>
> ---------- Forwarded message ----------
> From: *Allison, Timothy B.* <ta...@mitre.org>
> Date: 2017-12-07 14:10 GMT-02:00
> Subject: RE: Tika 1.17?
> To: "lfcnassif@gmail.com" <lf...@gmail.com>
>
> Agreed. Thank you! Do you mind sharing this with the list?
>
>
>
> *From:* Luís Filipe Nassif [mailto:lfcnassif@gmail.com]
> *Sent:* Thursday, December 7, 2017 10:26 AM
> *To:* Allison, Timothy B. <ta...@mitre.org>
> *Subject:* RE: Tika 1.17?
>
>
>
> Hi Tim,
>
>
>
> I don't think it is a blocker, maybe a minor regression, given we are much
> better with 20x more fixed exceptions. I sent it just to let us be aware.
> There are some few ~40 new exceptions with pdf, and 20x more fixed ones, so
> my vote is to go for 1.17!
>
>
>
> Luis
>
>
>
>
>
> Em 7 de dez de 2017 11:47 AM, "Allison, Timothy B." <ta...@mitre.org>
> escreveu:
>
> Thank you, Luís! Given where POI is in its dev cycle, should we go for a
> release of 1.17 now and then push for a 1.17.1 as soon as POI fixes this?
> Should we revert to 3.17-beta1? (wait, we can't do this because of a bug
> that prevents parsing of pptx in Solr)
>
> Or is this grave enough to wait a few months before we release 1.17?
>
> I found a zip/mime detection issue that we need to fix at the Tika level,
> but that fix is trivial.
>
>
> -----Original Message-----
> From: Luís Filipe Nassif [mailto:lfcnassif@gmail.com]
> Sent: Wednesday, December 6, 2017 9:30 AM
> To: dev@tika.apache.org
> Subject: Re: Tika 1.17?
>
> Hi Tim,
>
> I've had a briefly look at exceptions folder, seems we are much better
> with ppt (4677 fixed exceptions) and pdf (7798), but there are 208 new
> exceptions with ppt. I did not check the files to see if they are
> corrupted, but some common tokens were lost. Below the most common new
> stacktrace:
>
> org.apache.poi.hslf.exceptions.HSLFException: Couldn't instantiate the
> class for type with id 1000 on class class org.apache.poi.hslf.record.Document
> :
> java.lang.reflect.InvocationTargetException
> Cause was : org.apache.poi.hslf.exceptions.HSLFException: Couldn't
> instantiate the class for type with id 1010 on class class
> org.apache.poi.hslf.record.Environment :
> java.lang.reflect.InvocationTargetException
> Cause was : org.apache.poi.hslf.exceptions.HSLFException: Couldn't
> instantiate the class for type with id 2005 on class class
> org.apache.poi.hslf.record.FontCollection :
> java.lang.reflect.InvocationTargetException
> Cause was : java.lang.IllegalArgumentException: typeface can't be null
> nor empty at org.apache.poi.hslf.record.Record.createRecordForType(
> Record.java:186)
> at org.apache.poi.hslf.record.Record.buildRecordAtOffset(Record.java:104)
> at
> org.apache.poi.hslf.usermodel.HSLFSlideShowImpl.read(
> HSLFSlideShowImpl.java:279)
> at
> org.apache.poi.hslf.usermodel.HSLFSlideShowImpl.buildRecords(
> HSLFSlideShowImpl.java:260)
> at
> org.apache.poi.hslf.usermodel.HSLFSlideShowImpl.<init>(
> HSLFSlideShowImpl.java:166)
> at
> org.apache.poi.hslf.usermodel.HSLFSlideShow.<init>(HSLFSlideShow.java:181)
> at
> org.apache.tika.parser.microsoft.HSLFExtractor.parse(
> HSLFExtractor.java:78)
> at
> org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:179)
> at
> org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:132)
> at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
> at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
> at org.apache.tika.parser.AutoDetectParser.parse(
> AutoDetectParser.java:143)
> at org.apache.tika.parser.ParserDecorator.parse(ParserDecorator.java:188)
> at org.apache.tika.parser.DigestingParser.parse(DigestingParser.java:84)
> at
> org.apache.tika.parser.RecursiveParserWrapper.parse(
> RecursiveParserWrapper.java:158)
> at
> org.apache.tika.batch.FileResourceConsumer.parse(
> FileResourceConsumer.java:406)
> at
> org.apache.tika.batch.fs.RecursiveParserWrapperFSConsum
> er.processFileResource(RecursiveParserWrapperFSConsumer.java:104)
> at
> org.apache.tika.batch.FileResourceConsumer._processFileResource(
> FileResourceConsumer.java:181)
> at
> org.apache.tika.batch.FileResourceConsumer.call(
> FileResourceConsumer.java:115)
> at
> org.apache.tika.batch.FileResourceConsumer.call(
> FileResourceConsumer.java:50)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(
> ThreadPoolExecutor.java:1142)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(
> ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.reflect.InvocationTargetException
> at sun.reflect.GeneratedConstructorAccessor283.newInstance(Unknown
> Source) at
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(
> DelegatingConstructorAccessorImpl.java:45)
> at java.lang.reflect.Constructor.newInstance(Constructor.java:422)
> at org.apache.poi.hslf.record.Record.createRecordForType(Record.java:182)
> ... 25 more
> Caused by: org.apache.poi.hslf.exceptions.HSLFException: Couldn't
> instantiate the class for type with id 1010 on class class
> org.apache.poi.hslf.record.Environment :
> java.lang.reflect.InvocationTargetException
> Cause was : org.apache.poi.hslf.exceptions.HSLFException: Couldn't
> instantiate the class for type with id 2005 on class class
> org.apache.poi.hslf.record.FontCollection :
> java.lang.reflect.InvocationTargetException
> Cause was : java.lang.IllegalArgumentException: typeface can't be null
> nor empty at org.apache.poi.hslf.record.Record.createRecordForType(
> Record.java:186)
> at org.apache.poi.hslf.record.Record.findChildRecords(Record.java:129)
> at org.apache.poi.hslf.record.Document.<init>(Document.java:133)
> ... 29 more
> Caused by: java.lang.reflect.InvocationTargetException
> at sun.reflect.GeneratedConstructorAccessor285.newInstance(Unknown
> Source) at
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(
> DelegatingConstructorAccessorImpl.java:45)
> at java.lang.reflect.Constructor.newInstance(Constructor.java:422)
> at org.apache.poi.hslf.record.Record.createRecordForType(Record.java:182)
> ... 31 more
> Caused by: org.apache.poi.hslf.exceptions.HSLFException: Couldn't
> instantiate the class for type with id 2005 on class class
> org.apache.poi.hslf.record.FontCollection :
> java.lang.reflect.InvocationTargetException
> Cause was : java.lang.IllegalArgumentException: typeface can't be null
> nor empty at org.apache.poi.hslf.record.Record.createRecordForType(
> Record.java:186)
> at org.apache.poi.hslf.record.Record.findChildRecords(Record.java:129)
> at org.apache.poi.hslf.record.Environment.<init>(Environment.java:54)
> ... 35 more
> Caused by: java.lang.reflect.InvocationTargetException
> at sun.reflect.GeneratedConstructorAccessor286.newInstance(Unknown
> Source) at
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(
> DelegatingConstructorAccessorImpl.java:45)
> at java.lang.reflect.Constructor.newInstance(Constructor.java:422)
> at org.apache.poi.hslf.record.Record.createRecordForType(Record.java:182)
> ... 37 more
> Caused by: java.lang.IllegalArgumentException: typeface can't be null nor
> empty at
> org.apache.poi.hslf.usermodel.HSLFFontInfo.setTypeface(
> HSLFFontInfo.java:129)
> at org.apache.poi.hslf.usermodel.HSLFFontInfo.<init>(HSLFFontInfo.java:74)
> at org.apache.poi.hslf.record.FontCollection.<init>(
> FontCollection.java:47)
> ... 41 more
>
>
> 2017-12-05 21:44 GMT-02:00 Allison, Timothy B. <ta...@mitre.org>:
>
> > Reports are here:
> >
> > http://162.242.228.174/reports/reports_Tika1_16V1_17.zip
> >
> > I haven't had a chance to look. Tomorrow...
> >
> > Let me know what you find.
> >
> > -----Original Message-----
> > From: Allison, Timothy B. [mailto:tallison@mitre.org]
> > Sent: Wednesday, November 29, 2017 1:08 PM
> > To: dev@tika.apache.org
> > Subject: RE: Tika 1.17?
> >
> > +1
> >
> > -----Original Message-----
> > From: Chris Mattmann [mailto:mattmann@apache.org]
> > Sent: Wednesday, November 29, 2017 12:57 PM
> > To: dev@tika.apache.org
> > Subject: Re: Tika 1.17?
> >
> > Thanks so much for fixing this. It worked during MEMEX and then I
> > think has since fallen out of date and perhaps I committed Zarana’s
> > code wrong or something. Will be great to get this working!
> >
> >
> >
> > On 11/29/17, 9:54 AM, "David Meikle" <lo...@gmail.com> wrote:
> >
> > I am thinking TIKA-2385. I've got a resized image that I can
> > commit tonight
> > that should close this one off.
> >
> > Cheers,
> > Dave
> >
> >
> > On 29 Nov 2017 14:42, "Allison, Timothy B." <ta...@mitre.org>
> > wrote:
> >
> > Many thanks to Bob for help on TIKA-2502!
> >
> > Anything else we want to put into 1.17 before I run the regression
> > tests?
> >
> > -----Original Message-----
> > From: Allison, Timothy B. [mailto:tallison@mitre.org]
> > Sent: Monday, November 13, 2017 1:42 PM
> > To: dev@tika.apache.org
> > Subject: RE: Tika 1.17?
> >
> > Y. You're right. Thank you!
> >
> > I think I've been avoiding that because there were some regressions
> in
> > metadata-extractor last I looked at this. Let's hope those are gone
> in
> > 2.10.1.
> >
> > -----Original Message-----
> > From: Tyler Bui-Palsulich [mailto:tpalsulich@apache.org]
> > Sent: Sunday, November 12, 2017 2:54 PM
> > To: dev@tika.apache.org
> > Subject: RE: Tika 1.17?
> >
> > TIKA-2486 might be worth blocking on since there is a CVE.
> >
> > Tyler
> >
> > On Nov 6, 2017 5:26 AM, "Allison, Timothy B." <ta...@mitre.org>
> > wrote:
> >
> > > Y. I'm happy enough to wait a few more days. I wasn't able to
> kick
> > > off the regression tests last week. Should I wait for the new
> > parsers
> > > to run the regression tests?
> > >
> > > -----Original Message-----
> > > From: David Meikle [mailto:loompa@gmail.com]
> > > Sent: Friday, November 3, 2017 7:42 PM
> > > To: dev@tika.apache.org
> > > Subject: Re: Tika 1.17?
> > >
> > > Sounds good. I have a couple of new parsers I would like to slot in
> > > but not had a chance the last few months. Will go for it over the
> > > weekend, if that works for you Tim.
> > >
> > > Cheers,
> > > Dave
> > >
> > >
> > >
> > > On 3 November 2017 at 15:19, Mattmann, Chris A (3010) <
> > > chris.a.mattmann@jpl.nasa.gov> wrote:
> > >
> > > > Let’s make it so (
> > > >
> > > > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> > > ++++++++++++++
> > > > Chris Mattmann, Ph.D.
> > > > Principal Data Scientist, Engineering Administrative Office
> (3010)
> > > > Manager, NSF & Open Source Projects Formulation and Development
> > > > Offices
> > > > (8212)
> > > > NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
> > > > Office: 180-503E, Mailstop: 180-503
> > > > Email: chris.a.mattmann@nasa.gov
> > > > WWW: http://sunset.usc.edu/~mattmann/
> > > > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> > > ++++++++++++++
> > > > Director, Information Retrieval and Data Science Group (IRDS)
> > > > Adjunct Associate Professor, Computer Science Department
> University
> > > > of Southern California, Los Angeles, CA 90089 USA
> > > > WWW: http://irds.usc.edu/
> > > > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> > > ++++++++++++++
> > > >
> > > >
> > > >
> > > > On 11/3/17, 7:35 AM, "Allison, Timothy B."
> > <ta...@mitre.org>
> > wrote:
> > > >
> > > > All,
> > > >
> > > > PDFBox 2.0.8 is now integrated. I want to fix TIKA-2490
> before
> > > > we release 1.17. Are there other issues that are blockers or
> you'd
> > > > like to fix before 1.17 (TIKA-2471, maybe?)?
> > > >
> > > > I plan to run initial large scale regression tests shortly
> for
> > > > rfc822 and mbox because of TIKA-2478. I'll run the full
> regression
> > > > tests before cutting the RC, but I want to focus on those for
> now.
> > Other requests?
> > > >
> > > > Cheers,
> > > >
> > > > Tim
> > > >
> > > >
> > > >
> > >
> >
> >
> >
> >
>
>
>
>
>
RE: Tika 1.17?
Posted by "Allison, Timothy B." <ta...@mitre.org>.
Thank you, Luís. I’ve finally had a chance to take a look. As exceptions go, the PPT is the most eye-opening. I don’t know how I didn’t catch those…ugh.
There are a bunch more exceptions for zerobyte file exceptions in attachments, but this is a good thing, because now we can figure out if those are corrupt files, missing dependencies or something else…just a reporting artifact.
There are a bunch more exceptions for emf/wmf caused by “safelyAllocate”, which, I think, is a good thing. After the release, I’ll want to look at those to see if we need improvements in emf/wmf parsing, or if we need to bump the maximum expected byte lengths in the calls to safelyAllocate, or if the files are just plain corrupt.
After I fix TIKA-2483, I think I’ll be good to roll rc1 for 1.17.
Anything else holding us back?
From: Luís Filipe Nassif [mailto:lfcnassif@gmail.com]
Sent: Thursday, December 7, 2017 1:18 PM
To: dev@tika.apache.org; Allison, Timothy B. <ta...@mitre.org>
Subject: Fwd: Tika 1.17?
Oh sorry, I thought I have sent to dev list, forwarding...
Luis
---------- Forwarded message ----------
From: Allison, Timothy B. <ta...@mitre.org>>
Date: 2017-12-07 14:10 GMT-02:00
Subject: RE: Tika 1.17?
To: "lfcnassif@gmail.com<ma...@gmail.com>" <lf...@gmail.com>>
Agreed. Thank you! Do you mind sharing this with the list?
From: Luís Filipe Nassif [mailto:lfcnassif@gmail.com<ma...@gmail.com>]
Sent: Thursday, December 7, 2017 10:26 AM
To: Allison, Timothy B. <ta...@mitre.org>>
Subject: RE: Tika 1.17?
Hi Tim,
I don't think it is a blocker, maybe a minor regression, given we are much better with 20x more fixed exceptions. I sent it just to let us be aware. There are some few ~40 new exceptions with pdf, and 20x more fixed ones, so my vote is to go for 1.17!
Luis
Em 7 de dez de 2017 11:47 AM, "Allison, Timothy B." <ta...@mitre.org>> escreveu:
Thank you, Luís! Given where POI is in its dev cycle, should we go for a release of 1.17 now and then push for a 1.17.1 as soon as POI fixes this? Should we revert to 3.17-beta1? (wait, we can't do this because of a bug that prevents parsing of pptx in Solr)
Or is this grave enough to wait a few months before we release 1.17?
I found a zip/mime detection issue that we need to fix at the Tika level, but that fix is trivial.
-----Original Message-----
From: Luís Filipe Nassif [mailto:lfcnassif@gmail.com<ma...@gmail.com>]
Sent: Wednesday, December 6, 2017 9:30 AM
To: dev@tika.apache.org<ma...@tika.apache.org>
Subject: Re: Tika 1.17?
Hi Tim,
I've had a briefly look at exceptions folder, seems we are much better with ppt (4677 fixed exceptions) and pdf (7798), but there are 208 new exceptions with ppt. I did not check the files to see if they are corrupted, but some common tokens were lost. Below the most common new
stacktrace:
org.apache.poi.hslf.exceptions.HSLFException: Couldn't instantiate the class for type with id 1000 on class class org.apache.poi.hslf.record.Document :
java.lang.reflect.InvocationTargetException
Cause was : org.apache.poi.hslf.exceptions.HSLFException: Couldn't instantiate the class for type with id 1010 on class class org.apache.poi.hslf.record.Environment :
java.lang.reflect.InvocationTargetException
Cause was : org.apache.poi.hslf.exceptions.HSLFException: Couldn't instantiate the class for type with id 2005 on class class org.apache.poi.hslf.record.FontCollection :
java.lang.reflect.InvocationTargetException
Cause was : java.lang.IllegalArgumentException: typeface can't be null nor empty at org.apache.poi.hslf.record.Record.createRecordForType(Record.java:186)
at org.apache.poi.hslf.record.Record.buildRecordAtOffset(Record.java:104)
at
org.apache.poi.hslf.usermodel.HSLFSlideShowImpl.read(HSLFSlideShowImpl.java:279)
at
org.apache.poi.hslf.usermodel.HSLFSlideShowImpl.buildRecords(HSLFSlideShowImpl.java:260)
at
org.apache.poi.hslf.usermodel.HSLFSlideShowImpl.<init>(HSLFSlideShowImpl.java:166)
at
org.apache.poi.hslf.usermodel.HSLFSlideShow.<init>(HSLFSlideShow.java:181)
at
org.apache.tika.parser.microsoft.HSLFExtractor.parse(HSLFExtractor.java:78)
at
org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:179)
at
org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:132)
at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:143)
at org.apache.tika.parser.ParserDecorator.parse(ParserDecorator.java:188)
at org.apache.tika.parser.DigestingParser.parse(DigestingParser.java:84)
at
org.apache.tika.parser.RecursiveParserWrapper.parse(RecursiveParserWrapper.java:158)
at
org.apache.tika.batch.FileResourceConsumer.parse(FileResourceConsumer.java:406)
at
org.apache.tika.batch.fs.RecursiveParserWrapperFSConsumer.processFileResource(RecursiveParserWrapperFSConsumer.java:104)
at
org.apache.tika.batch.FileResourceConsumer._processFileResource(FileResourceConsumer.java:181)
at
org.apache.tika.batch.FileResourceConsumer.call(FileResourceConsumer.java:115)
at
org.apache.tika.batch.FileResourceConsumer.call(FileResourceConsumer.java:50)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.GeneratedConstructorAccessor283.newInstance(Unknown Source) at
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:422)
at org.apache.poi.hslf.record.Record.createRecordForType(Record.java:182)
... 25 more
Caused by: org.apache.poi.hslf.exceptions.HSLFException: Couldn't instantiate the class for type with id 1010 on class class org.apache.poi.hslf.record.Environment :
java.lang.reflect.InvocationTargetException
Cause was : org.apache.poi.hslf.exceptions.HSLFException: Couldn't instantiate the class for type with id 2005 on class class org.apache.poi.hslf.record.FontCollection :
java.lang.reflect.InvocationTargetException
Cause was : java.lang.IllegalArgumentException: typeface can't be null nor empty at org.apache.poi.hslf.record.Record.createRecordForType(Record.java:186)
at org.apache.poi.hslf.record.Record.findChildRecords(Record.java:129)
at org.apache.poi.hslf.record.Document.<init>(Document.java:133)
... 29 more
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.GeneratedConstructorAccessor285.newInstance(Unknown Source) at
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:422)
at org.apache.poi.hslf.record.Record.createRecordForType(Record.java:182)
... 31 more
Caused by: org.apache.poi.hslf.exceptions.HSLFException: Couldn't instantiate the class for type with id 2005 on class class org.apache.poi.hslf.record.FontCollection :
java.lang.reflect.InvocationTargetException
Cause was : java.lang.IllegalArgumentException: typeface can't be null nor empty at org.apache.poi.hslf.record.Record.createRecordForType(Record.java:186)
at org.apache.poi.hslf.record.Record.findChildRecords(Record.java:129)
at org.apache.poi.hslf.record.Environment.<init>(Environment.java:54)
... 35 more
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.GeneratedConstructorAccessor286.newInstance(Unknown Source) at
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:422)
at org.apache.poi.hslf.record.Record.createRecordForType(Record.java:182)
... 37 more
Caused by: java.lang.IllegalArgumentException: typeface can't be null nor empty at
org.apache.poi.hslf.usermodel.HSLFFontInfo.setTypeface(HSLFFontInfo.java:129)
at org.apache.poi.hslf.usermodel.HSLFFontInfo.<init>(HSLFFontInfo.java:74)
at org.apache.poi.hslf.record.FontCollection.<init>(FontCollection.java:47)
... 41 more
2017-12-05 21:44 GMT-02:00 Allison, Timothy B. <ta...@mitre.org>>:
> Reports are here:
>
> http://162.242.228.174/reports/reports_Tika1_16V1_17.zip
>
> I haven't had a chance to look. Tomorrow...
>
> Let me know what you find.
>
> -----Original Message-----
> From: Allison, Timothy B. [mailto:tallison@mitre.org<ma...@mitre.org>]
> Sent: Wednesday, November 29, 2017 1:08 PM
> To: dev@tika.apache.org<ma...@tika.apache.org>
> Subject: RE: Tika 1.17?
>
> +1
>
> -----Original Message-----
> From: Chris Mattmann [mailto:mattmann@apache.org<ma...@apache.org>]
> Sent: Wednesday, November 29, 2017 12:57 PM
> To: dev@tika.apache.org<ma...@tika.apache.org>
> Subject: Re: Tika 1.17?
>
> Thanks so much for fixing this. It worked during MEMEX and then I
> think has since fallen out of date and perhaps I committed Zarana’s
> code wrong or something. Will be great to get this working!
>
>
>
> On 11/29/17, 9:54 AM, "David Meikle" <lo...@gmail.com>> wrote:
>
> I am thinking TIKA-2385. I've got a resized image that I can
> commit tonight
> that should close this one off.
>
> Cheers,
> Dave
>
>
> On 29 Nov 2017 14:42, "Allison, Timothy B." <ta...@mitre.org>>
> wrote:
>
> Many thanks to Bob for help on TIKA-2502!
>
> Anything else we want to put into 1.17 before I run the regression
> tests?
>
> -----Original Message-----
> From: Allison, Timothy B. [mailto:tallison@mitre.org<ma...@mitre.org>]
> Sent: Monday, November 13, 2017 1:42 PM
> To: dev@tika.apache.org<ma...@tika.apache.org>
> Subject: RE: Tika 1.17?
>
> Y. You're right. Thank you!
>
> I think I've been avoiding that because there were some regressions in
> metadata-extractor last I looked at this. Let's hope those are gone in
> 2.10.1.
>
> -----Original Message-----
> From: Tyler Bui-Palsulich [mailto:tpalsulich@apache.org<ma...@apache.org>]
> Sent: Sunday, November 12, 2017 2:54 PM
> To: dev@tika.apache.org<ma...@tika.apache.org>
> Subject: RE: Tika 1.17?
>
> TIKA-2486 might be worth blocking on since there is a CVE.
>
> Tyler
>
> On Nov 6, 2017 5:26 AM, "Allison, Timothy B." <ta...@mitre.org>>
> wrote:
>
> > Y. I'm happy enough to wait a few more days. I wasn't able to kick
> > off the regression tests last week. Should I wait for the new
> parsers
> > to run the regression tests?
> >
> > -----Original Message-----
> > From: David Meikle [mailto:loompa@gmail.com<ma...@gmail.com>]
> > Sent: Friday, November 3, 2017 7:42 PM
> > To: dev@tika.apache.org<ma...@tika.apache.org>
> > Subject: Re: Tika 1.17?
> >
> > Sounds good. I have a couple of new parsers I would like to slot in
> > but not had a chance the last few months. Will go for it over the
> > weekend, if that works for you Tim.
> >
> > Cheers,
> > Dave
> >
> >
> >
> > On 3 November 2017 at 15:19, Mattmann, Chris A (3010) <
> > chris.a.mattmann@jpl.nasa.gov<ma...@jpl.nasa.gov>> wrote:
> >
> > > Let’s make it so (
> > >
> > > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> > ++++++++++++++
> > > Chris Mattmann, Ph.D.
> > > Principal Data Scientist, Engineering Administrative Office (3010)
> > > Manager, NSF & Open Source Projects Formulation and Development
> > > Offices
> > > (8212)
> > > NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
> > > Office: 180-503E, Mailstop: 180-503
> > > Email: chris.a.mattmann@nasa.gov<ma...@nasa.gov>
> > > WWW: http://sunset.usc.edu/~mattmann/
> > > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> > ++++++++++++++
> > > Director, Information Retrieval and Data Science Group (IRDS)
> > > Adjunct Associate Professor, Computer Science Department University
> > > of Southern California, Los Angeles, CA 90089 USA
> > > WWW: http://irds.usc.edu/
> > > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> > ++++++++++++++
> > >
> > >
> > >
> > > On 11/3/17, 7:35 AM, "Allison, Timothy B."
> <ta...@mitre.org>>
> wrote:
> > >
> > > All,
> > >
> > > PDFBox 2.0.8 is now integrated. I want to fix TIKA-2490 before
> > > we release 1.17. Are there other issues that are blockers or you'd
> > > like to fix before 1.17 (TIKA-2471, maybe?)?
> > >
> > > I plan to run initial large scale regression tests shortly for
> > > rfc822 and mbox because of TIKA-2478. I'll run the full regression
> > > tests before cutting the RC, but I want to focus on those for now.
> Other requests?
> > >
> > > Cheers,
> > >
> > > Tim
> > >
> > >
> > >
> >
>
>
>
>
Fwd: Tika 1.17?
Posted by Luís Filipe Nassif <lf...@gmail.com>.
Oh sorry, I thought I have sent to dev list, forwarding...
Luis
---------- Forwarded message ----------
From: Allison, Timothy B. <ta...@mitre.org>
Date: 2017-12-07 14:10 GMT-02:00
Subject: RE: Tika 1.17?
To: "lfcnassif@gmail.com" <lf...@gmail.com>
Agreed. Thank you! Do you mind sharing this with the list?
*From:* Luís Filipe Nassif [mailto:lfcnassif@gmail.com]
*Sent:* Thursday, December 7, 2017 10:26 AM
*To:* Allison, Timothy B. <ta...@mitre.org>
*Subject:* RE: Tika 1.17?
Hi Tim,
I don't think it is a blocker, maybe a minor regression, given we are much
better with 20x more fixed exceptions. I sent it just to let us be aware.
There are some few ~40 new exceptions with pdf, and 20x more fixed ones, so
my vote is to go for 1.17!
Luis
Em 7 de dez de 2017 11:47 AM, "Allison, Timothy B." <ta...@mitre.org>
escreveu:
Thank you, Luís! Given where POI is in its dev cycle, should we go for a
release of 1.17 now and then push for a 1.17.1 as soon as POI fixes this?
Should we revert to 3.17-beta1? (wait, we can't do this because of a bug
that prevents parsing of pptx in Solr)
Or is this grave enough to wait a few months before we release 1.17?
I found a zip/mime detection issue that we need to fix at the Tika level,
but that fix is trivial.
-----Original Message-----
From: Luís Filipe Nassif [mailto:lfcnassif@gmail.com]
Sent: Wednesday, December 6, 2017 9:30 AM
To: dev@tika.apache.org
Subject: Re: Tika 1.17?
Hi Tim,
I've had a briefly look at exceptions folder, seems we are much better with
ppt (4677 fixed exceptions) and pdf (7798), but there are 208 new
exceptions with ppt. I did not check the files to see if they are
corrupted, but some common tokens were lost. Below the most common new
stacktrace:
org.apache.poi.hslf.exceptions.HSLFException: Couldn't instantiate the
class for type with id 1000 on class class org.apache.poi.hslf.record.Document
:
java.lang.reflect.InvocationTargetException
Cause was : org.apache.poi.hslf.exceptions.HSLFException: Couldn't
instantiate the class for type with id 1010 on class class
org.apache.poi.hslf.record.Environment :
java.lang.reflect.InvocationTargetException
Cause was : org.apache.poi.hslf.exceptions.HSLFException: Couldn't
instantiate the class for type with id 2005 on class class
org.apache.poi.hslf.record.FontCollection :
java.lang.reflect.InvocationTargetException
Cause was : java.lang.IllegalArgumentException: typeface can't be null nor
empty at org.apache.poi.hslf.record.Record.createRecordForType(
Record.java:186)
at org.apache.poi.hslf.record.Record.buildRecordAtOffset(Record.java:104)
at
org.apache.poi.hslf.usermodel.HSLFSlideShowImpl.read(
HSLFSlideShowImpl.java:279)
at
org.apache.poi.hslf.usermodel.HSLFSlideShowImpl.buildRecords(
HSLFSlideShowImpl.java:260)
at
org.apache.poi.hslf.usermodel.HSLFSlideShowImpl.<init>(
HSLFSlideShowImpl.java:166)
at
org.apache.poi.hslf.usermodel.HSLFSlideShow.<init>(HSLFSlideShow.java:181)
at
org.apache.tika.parser.microsoft.HSLFExtractor.parse(HSLFExtractor.java:78)
at
org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:179)
at
org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:132)
at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:143)
at org.apache.tika.parser.ParserDecorator.parse(ParserDecorator.java:188)
at org.apache.tika.parser.DigestingParser.parse(DigestingParser.java:84)
at
org.apache.tika.parser.RecursiveParserWrapper.parse(
RecursiveParserWrapper.java:158)
at
org.apache.tika.batch.FileResourceConsumer.parse(
FileResourceConsumer.java:406)
at
org.apache.tika.batch.fs.RecursiveParserWrapperFSConsum
er.processFileResource(RecursiveParserWrapperFSConsumer.java:104)
at
org.apache.tika.batch.FileResourceConsumer._processFileResource(
FileResourceConsumer.java:181)
at
org.apache.tika.batch.FileResourceConsumer.call(
FileResourceConsumer.java:115)
at
org.apache.tika.batch.FileResourceConsumer.call(
FileResourceConsumer.java:50)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(
ThreadPoolExecutor.java:1142)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(
ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.GeneratedConstructorAccessor283.newInstance(Unknown Source)
at
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(
DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:422)
at org.apache.poi.hslf.record.Record.createRecordForType(Record.java:182)
... 25 more
Caused by: org.apache.poi.hslf.exceptions.HSLFException: Couldn't
instantiate the class for type with id 1010 on class class
org.apache.poi.hslf.record.Environment :
java.lang.reflect.InvocationTargetException
Cause was : org.apache.poi.hslf.exceptions.HSLFException: Couldn't
instantiate the class for type with id 2005 on class class
org.apache.poi.hslf.record.FontCollection :
java.lang.reflect.InvocationTargetException
Cause was : java.lang.IllegalArgumentException: typeface can't be null nor
empty at org.apache.poi.hslf.record.Record.createRecordForType(
Record.java:186)
at org.apache.poi.hslf.record.Record.findChildRecords(Record.java:129)
at org.apache.poi.hslf.record.Document.<init>(Document.java:133)
... 29 more
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.GeneratedConstructorAccessor285.newInstance(Unknown Source)
at
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(
DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:422)
at org.apache.poi.hslf.record.Record.createRecordForType(Record.java:182)
... 31 more
Caused by: org.apache.poi.hslf.exceptions.HSLFException: Couldn't
instantiate the class for type with id 2005 on class class
org.apache.poi.hslf.record.FontCollection :
java.lang.reflect.InvocationTargetException
Cause was : java.lang.IllegalArgumentException: typeface can't be null nor
empty at org.apache.poi.hslf.record.Record.createRecordForType(
Record.java:186)
at org.apache.poi.hslf.record.Record.findChildRecords(Record.java:129)
at org.apache.poi.hslf.record.Environment.<init>(Environment.java:54)
... 35 more
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.GeneratedConstructorAccessor286.newInstance(Unknown Source)
at
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(
DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:422)
at org.apache.poi.hslf.record.Record.createRecordForType(Record.java:182)
... 37 more
Caused by: java.lang.IllegalArgumentException: typeface can't be null nor
empty at
org.apache.poi.hslf.usermodel.HSLFFontInfo.setTypeface(
HSLFFontInfo.java:129)
at org.apache.poi.hslf.usermodel.HSLFFontInfo.<init>(HSLFFontInfo.java:74)
at org.apache.poi.hslf.record.FontCollection.<init>(FontCollection.java:47)
... 41 more
2017-12-05 21:44 GMT-02:00 Allison, Timothy B. <ta...@mitre.org>:
> Reports are here:
>
> http://162.242.228.174/reports/reports_Tika1_16V1_17.zip
>
> I haven't had a chance to look. Tomorrow...
>
> Let me know what you find.
>
> -----Original Message-----
> From: Allison, Timothy B. [mailto:tallison@mitre.org]
> Sent: Wednesday, November 29, 2017 1:08 PM
> To: dev@tika.apache.org
> Subject: RE: Tika 1.17?
>
> +1
>
> -----Original Message-----
> From: Chris Mattmann [mailto:mattmann@apache.org]
> Sent: Wednesday, November 29, 2017 12:57 PM
> To: dev@tika.apache.org
> Subject: Re: Tika 1.17?
>
> Thanks so much for fixing this. It worked during MEMEX and then I
> think has since fallen out of date and perhaps I committed Zarana’s
> code wrong or something. Will be great to get this working!
>
>
>
> On 11/29/17, 9:54 AM, "David Meikle" <lo...@gmail.com> wrote:
>
> I am thinking TIKA-2385. I've got a resized image that I can
> commit tonight
> that should close this one off.
>
> Cheers,
> Dave
>
>
> On 29 Nov 2017 14:42, "Allison, Timothy B." <ta...@mitre.org>
> wrote:
>
> Many thanks to Bob for help on TIKA-2502!
>
> Anything else we want to put into 1.17 before I run the regression
> tests?
>
> -----Original Message-----
> From: Allison, Timothy B. [mailto:tallison@mitre.org]
> Sent: Monday, November 13, 2017 1:42 PM
> To: dev@tika.apache.org
> Subject: RE: Tika 1.17?
>
> Y. You're right. Thank you!
>
> I think I've been avoiding that because there were some regressions
in
> metadata-extractor last I looked at this. Let's hope those are gone
in
> 2.10.1.
>
> -----Original Message-----
> From: Tyler Bui-Palsulich [mailto:tpalsulich@apache.org]
> Sent: Sunday, November 12, 2017 2:54 PM
> To: dev@tika.apache.org
> Subject: RE: Tika 1.17?
>
> TIKA-2486 might be worth blocking on since there is a CVE.
>
> Tyler
>
> On Nov 6, 2017 5:26 AM, "Allison, Timothy B." <ta...@mitre.org>
> wrote:
>
> > Y. I'm happy enough to wait a few more days. I wasn't able to
kick
> > off the regression tests last week. Should I wait for the new
> parsers
> > to run the regression tests?
> >
> > -----Original Message-----
> > From: David Meikle [mailto:loompa@gmail.com]
> > Sent: Friday, November 3, 2017 7:42 PM
> > To: dev@tika.apache.org
> > Subject: Re: Tika 1.17?
> >
> > Sounds good. I have a couple of new parsers I would like to slot in
> > but not had a chance the last few months. Will go for it over the
> > weekend, if that works for you Tim.
> >
> > Cheers,
> > Dave
> >
> >
> >
> > On 3 November 2017 at 15:19, Mattmann, Chris A (3010) <
> > chris.a.mattmann@jpl.nasa.gov> wrote:
> >
> > > Let’s make it so (
> > >
> > > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> > ++++++++++++++
> > > Chris Mattmann, Ph.D.
> > > Principal Data Scientist, Engineering Administrative Office (3010)
> > > Manager, NSF & Open Source Projects Formulation and Development
> > > Offices
> > > (8212)
> > > NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
> > > Office: 180-503E, Mailstop: 180-503
> > > Email: chris.a.mattmann@nasa.gov
> > > WWW: http://sunset.usc.edu/~mattmann/
> > > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> > ++++++++++++++
> > > Director, Information Retrieval and Data Science Group (IRDS)
> > > Adjunct Associate Professor, Computer Science Department
University
> > > of Southern California, Los Angeles, CA 90089 USA
> > > WWW: http://irds.usc.edu/
> > > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> > ++++++++++++++
> > >
> > >
> > >
> > > On 11/3/17, 7:35 AM, "Allison, Timothy B."
> <ta...@mitre.org>
> wrote:
> > >
> > > All,
> > >
> > > PDFBox 2.0.8 is now integrated. I want to fix TIKA-2490
before
> > > we release 1.17. Are there other issues that are blockers or
you'd
> > > like to fix before 1.17 (TIKA-2471, maybe?)?
> > >
> > > I plan to run initial large scale regression tests shortly for
> > > rfc822 and mbox because of TIKA-2478. I'll run the full
regression
> > > tests before cutting the RC, but I want to focus on those for now.
> Other requests?
> > >
> > > Cheers,
> > >
> > > Tim
> > >
> > >
> > >
> >
>
>
>
>
RE: Tika 1.17?
Posted by "Allison, Timothy B." <ta...@mitre.org>.
Thank you, Luís! Given where POI is in its dev cycle, should we go for a release of 1.17 now and then push for a 1.17.1 as soon as POI fixes this? Should we revert to 3.17-beta1? (wait, we can't do this because of a bug that prevents parsing of pptx in Solr)
Or is this grave enough to wait a few months before we release 1.17?
I found a zip/mime detection issue that we need to fix at the Tika level, but that fix is trivial.
-----Original Message-----
From: Luís Filipe Nassif [mailto:lfcnassif@gmail.com]
Sent: Wednesday, December 6, 2017 9:30 AM
To: dev@tika.apache.org
Subject: Re: Tika 1.17?
Hi Tim,
I've had a briefly look at exceptions folder, seems we are much better with ppt (4677 fixed exceptions) and pdf (7798), but there are 208 new exceptions with ppt. I did not check the files to see if they are corrupted, but some common tokens were lost. Below the most common new
stacktrace:
org.apache.poi.hslf.exceptions.HSLFException: Couldn't instantiate the class for type with id 1000 on class class org.apache.poi.hslf.record.Document :
java.lang.reflect.InvocationTargetException
Cause was : org.apache.poi.hslf.exceptions.HSLFException: Couldn't instantiate the class for type with id 1010 on class class org.apache.poi.hslf.record.Environment :
java.lang.reflect.InvocationTargetException
Cause was : org.apache.poi.hslf.exceptions.HSLFException: Couldn't instantiate the class for type with id 2005 on class class org.apache.poi.hslf.record.FontCollection :
java.lang.reflect.InvocationTargetException
Cause was : java.lang.IllegalArgumentException: typeface can't be null nor empty at org.apache.poi.hslf.record.Record.createRecordForType(Record.java:186)
at org.apache.poi.hslf.record.Record.buildRecordAtOffset(Record.java:104)
at
org.apache.poi.hslf.usermodel.HSLFSlideShowImpl.read(HSLFSlideShowImpl.java:279)
at
org.apache.poi.hslf.usermodel.HSLFSlideShowImpl.buildRecords(HSLFSlideShowImpl.java:260)
at
org.apache.poi.hslf.usermodel.HSLFSlideShowImpl.<init>(HSLFSlideShowImpl.java:166)
at
org.apache.poi.hslf.usermodel.HSLFSlideShow.<init>(HSLFSlideShow.java:181)
at
org.apache.tika.parser.microsoft.HSLFExtractor.parse(HSLFExtractor.java:78)
at
org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:179)
at
org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:132)
at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:143)
at org.apache.tika.parser.ParserDecorator.parse(ParserDecorator.java:188)
at org.apache.tika.parser.DigestingParser.parse(DigestingParser.java:84)
at
org.apache.tika.parser.RecursiveParserWrapper.parse(RecursiveParserWrapper.java:158)
at
org.apache.tika.batch.FileResourceConsumer.parse(FileResourceConsumer.java:406)
at
org.apache.tika.batch.fs.RecursiveParserWrapperFSConsumer.processFileResource(RecursiveParserWrapperFSConsumer.java:104)
at
org.apache.tika.batch.FileResourceConsumer._processFileResource(FileResourceConsumer.java:181)
at
org.apache.tika.batch.FileResourceConsumer.call(FileResourceConsumer.java:115)
at
org.apache.tika.batch.FileResourceConsumer.call(FileResourceConsumer.java:50)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.GeneratedConstructorAccessor283.newInstance(Unknown Source) at
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:422)
at org.apache.poi.hslf.record.Record.createRecordForType(Record.java:182)
... 25 more
Caused by: org.apache.poi.hslf.exceptions.HSLFException: Couldn't instantiate the class for type with id 1010 on class class org.apache.poi.hslf.record.Environment :
java.lang.reflect.InvocationTargetException
Cause was : org.apache.poi.hslf.exceptions.HSLFException: Couldn't instantiate the class for type with id 2005 on class class org.apache.poi.hslf.record.FontCollection :
java.lang.reflect.InvocationTargetException
Cause was : java.lang.IllegalArgumentException: typeface can't be null nor empty at org.apache.poi.hslf.record.Record.createRecordForType(Record.java:186)
at org.apache.poi.hslf.record.Record.findChildRecords(Record.java:129)
at org.apache.poi.hslf.record.Document.<init>(Document.java:133)
... 29 more
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.GeneratedConstructorAccessor285.newInstance(Unknown Source) at
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:422)
at org.apache.poi.hslf.record.Record.createRecordForType(Record.java:182)
... 31 more
Caused by: org.apache.poi.hslf.exceptions.HSLFException: Couldn't instantiate the class for type with id 2005 on class class org.apache.poi.hslf.record.FontCollection :
java.lang.reflect.InvocationTargetException
Cause was : java.lang.IllegalArgumentException: typeface can't be null nor empty at org.apache.poi.hslf.record.Record.createRecordForType(Record.java:186)
at org.apache.poi.hslf.record.Record.findChildRecords(Record.java:129)
at org.apache.poi.hslf.record.Environment.<init>(Environment.java:54)
... 35 more
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.GeneratedConstructorAccessor286.newInstance(Unknown Source) at
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:422)
at org.apache.poi.hslf.record.Record.createRecordForType(Record.java:182)
... 37 more
Caused by: java.lang.IllegalArgumentException: typeface can't be null nor empty at
org.apache.poi.hslf.usermodel.HSLFFontInfo.setTypeface(HSLFFontInfo.java:129)
at org.apache.poi.hslf.usermodel.HSLFFontInfo.<init>(HSLFFontInfo.java:74)
at org.apache.poi.hslf.record.FontCollection.<init>(FontCollection.java:47)
... 41 more
2017-12-05 21:44 GMT-02:00 Allison, Timothy B. <ta...@mitre.org>:
> Reports are here:
>
> http://162.242.228.174/reports/reports_Tika1_16V1_17.zip
>
> I haven't had a chance to look. Tomorrow...
>
> Let me know what you find.
>
> -----Original Message-----
> From: Allison, Timothy B. [mailto:tallison@mitre.org]
> Sent: Wednesday, November 29, 2017 1:08 PM
> To: dev@tika.apache.org
> Subject: RE: Tika 1.17?
>
> +1
>
> -----Original Message-----
> From: Chris Mattmann [mailto:mattmann@apache.org]
> Sent: Wednesday, November 29, 2017 12:57 PM
> To: dev@tika.apache.org
> Subject: Re: Tika 1.17?
>
> Thanks so much for fixing this. It worked during MEMEX and then I
> think has since fallen out of date and perhaps I committed Zarana’s
> code wrong or something. Will be great to get this working!
>
>
>
> On 11/29/17, 9:54 AM, "David Meikle" <lo...@gmail.com> wrote:
>
> I am thinking TIKA-2385. I've got a resized image that I can
> commit tonight
> that should close this one off.
>
> Cheers,
> Dave
>
>
> On 29 Nov 2017 14:42, "Allison, Timothy B." <ta...@mitre.org>
> wrote:
>
> Many thanks to Bob for help on TIKA-2502!
>
> Anything else we want to put into 1.17 before I run the regression
> tests?
>
> -----Original Message-----
> From: Allison, Timothy B. [mailto:tallison@mitre.org]
> Sent: Monday, November 13, 2017 1:42 PM
> To: dev@tika.apache.org
> Subject: RE: Tika 1.17?
>
> Y. You're right. Thank you!
>
> I think I've been avoiding that because there were some regressions in
> metadata-extractor last I looked at this. Let's hope those are gone in
> 2.10.1.
>
> -----Original Message-----
> From: Tyler Bui-Palsulich [mailto:tpalsulich@apache.org]
> Sent: Sunday, November 12, 2017 2:54 PM
> To: dev@tika.apache.org
> Subject: RE: Tika 1.17?
>
> TIKA-2486 might be worth blocking on since there is a CVE.
>
> Tyler
>
> On Nov 6, 2017 5:26 AM, "Allison, Timothy B." <ta...@mitre.org>
> wrote:
>
> > Y. I'm happy enough to wait a few more days. I wasn't able to kick
> > off the regression tests last week. Should I wait for the new
> parsers
> > to run the regression tests?
> >
> > -----Original Message-----
> > From: David Meikle [mailto:loompa@gmail.com]
> > Sent: Friday, November 3, 2017 7:42 PM
> > To: dev@tika.apache.org
> > Subject: Re: Tika 1.17?
> >
> > Sounds good. I have a couple of new parsers I would like to slot in
> > but not had a chance the last few months. Will go for it over the
> > weekend, if that works for you Tim.
> >
> > Cheers,
> > Dave
> >
> >
> >
> > On 3 November 2017 at 15:19, Mattmann, Chris A (3010) <
> > chris.a.mattmann@jpl.nasa.gov> wrote:
> >
> > > Let’s make it so (
> > >
> > > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> > ++++++++++++++
> > > Chris Mattmann, Ph.D.
> > > Principal Data Scientist, Engineering Administrative Office (3010)
> > > Manager, NSF & Open Source Projects Formulation and Development
> > > Offices
> > > (8212)
> > > NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
> > > Office: 180-503E, Mailstop: 180-503
> > > Email: chris.a.mattmann@nasa.gov
> > > WWW: http://sunset.usc.edu/~mattmann/
> > > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> > ++++++++++++++
> > > Director, Information Retrieval and Data Science Group (IRDS)
> > > Adjunct Associate Professor, Computer Science Department University
> > > of Southern California, Los Angeles, CA 90089 USA
> > > WWW: http://irds.usc.edu/
> > > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> > ++++++++++++++
> > >
> > >
> > >
> > > On 11/3/17, 7:35 AM, "Allison, Timothy B."
> <ta...@mitre.org>
> wrote:
> > >
> > > All,
> > >
> > > PDFBox 2.0.8 is now integrated. I want to fix TIKA-2490 before
> > > we release 1.17. Are there other issues that are blockers or you'd
> > > like to fix before 1.17 (TIKA-2471, maybe?)?
> > >
> > > I plan to run initial large scale regression tests shortly for
> > > rfc822 and mbox because of TIKA-2478. I'll run the full regression
> > > tests before cutting the RC, but I want to focus on those for now.
> Other requests?
> > >
> > > Cheers,
> > >
> > > Tim
> > >
> > >
> > >
> >
>
>
>
>
Re: Tika 1.17?
Posted by Luís Filipe Nassif <lf...@gmail.com>.
Hi Tim,
I've had a briefly look at exceptions folder, seems we are much better with
ppt (4677 fixed exceptions) and pdf (7798), but there are 208 new
exceptions with ppt. I did not check the files to see if they are
corrupted, but some common tokens were lost. Below the most common new
stacktrace:
org.apache.poi.hslf.exceptions.HSLFException: Couldn't instantiate the
class for type with id 1000 on class class
org.apache.poi.hslf.record.Document :
java.lang.reflect.InvocationTargetException
Cause was : org.apache.poi.hslf.exceptions.HSLFException: Couldn't
instantiate the class for type with id 1010 on class class
org.apache.poi.hslf.record.Environment :
java.lang.reflect.InvocationTargetException
Cause was : org.apache.poi.hslf.exceptions.HSLFException: Couldn't
instantiate the class for type with id 2005 on class class
org.apache.poi.hslf.record.FontCollection :
java.lang.reflect.InvocationTargetException
Cause was : java.lang.IllegalArgumentException: typeface can't be null nor
empty
at org.apache.poi.hslf.record.Record.createRecordForType(Record.java:186)
at org.apache.poi.hslf.record.Record.buildRecordAtOffset(Record.java:104)
at
org.apache.poi.hslf.usermodel.HSLFSlideShowImpl.read(HSLFSlideShowImpl.java:279)
at
org.apache.poi.hslf.usermodel.HSLFSlideShowImpl.buildRecords(HSLFSlideShowImpl.java:260)
at
org.apache.poi.hslf.usermodel.HSLFSlideShowImpl.<init>(HSLFSlideShowImpl.java:166)
at
org.apache.poi.hslf.usermodel.HSLFSlideShow.<init>(HSLFSlideShow.java:181)
at
org.apache.tika.parser.microsoft.HSLFExtractor.parse(HSLFExtractor.java:78)
at
org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:179)
at
org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:132)
at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:143)
at org.apache.tika.parser.ParserDecorator.parse(ParserDecorator.java:188)
at org.apache.tika.parser.DigestingParser.parse(DigestingParser.java:84)
at
org.apache.tika.parser.RecursiveParserWrapper.parse(RecursiveParserWrapper.java:158)
at
org.apache.tika.batch.FileResourceConsumer.parse(FileResourceConsumer.java:406)
at
org.apache.tika.batch.fs.RecursiveParserWrapperFSConsumer.processFileResource(RecursiveParserWrapperFSConsumer.java:104)
at
org.apache.tika.batch.FileResourceConsumer._processFileResource(FileResourceConsumer.java:181)
at
org.apache.tika.batch.FileResourceConsumer.call(FileResourceConsumer.java:115)
at
org.apache.tika.batch.FileResourceConsumer.call(FileResourceConsumer.java:50)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.GeneratedConstructorAccessor283.newInstance(Unknown Source)
at
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:422)
at org.apache.poi.hslf.record.Record.createRecordForType(Record.java:182)
... 25 more
Caused by: org.apache.poi.hslf.exceptions.HSLFException: Couldn't
instantiate the class for type with id 1010 on class class
org.apache.poi.hslf.record.Environment :
java.lang.reflect.InvocationTargetException
Cause was : org.apache.poi.hslf.exceptions.HSLFException: Couldn't
instantiate the class for type with id 2005 on class class
org.apache.poi.hslf.record.FontCollection :
java.lang.reflect.InvocationTargetException
Cause was : java.lang.IllegalArgumentException: typeface can't be null nor
empty
at org.apache.poi.hslf.record.Record.createRecordForType(Record.java:186)
at org.apache.poi.hslf.record.Record.findChildRecords(Record.java:129)
at org.apache.poi.hslf.record.Document.<init>(Document.java:133)
... 29 more
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.GeneratedConstructorAccessor285.newInstance(Unknown Source)
at
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:422)
at org.apache.poi.hslf.record.Record.createRecordForType(Record.java:182)
... 31 more
Caused by: org.apache.poi.hslf.exceptions.HSLFException: Couldn't
instantiate the class for type with id 2005 on class class
org.apache.poi.hslf.record.FontCollection :
java.lang.reflect.InvocationTargetException
Cause was : java.lang.IllegalArgumentException: typeface can't be null nor
empty
at org.apache.poi.hslf.record.Record.createRecordForType(Record.java:186)
at org.apache.poi.hslf.record.Record.findChildRecords(Record.java:129)
at org.apache.poi.hslf.record.Environment.<init>(Environment.java:54)
... 35 more
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.GeneratedConstructorAccessor286.newInstance(Unknown Source)
at
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:422)
at org.apache.poi.hslf.record.Record.createRecordForType(Record.java:182)
... 37 more
Caused by: java.lang.IllegalArgumentException: typeface can't be null nor
empty
at
org.apache.poi.hslf.usermodel.HSLFFontInfo.setTypeface(HSLFFontInfo.java:129)
at org.apache.poi.hslf.usermodel.HSLFFontInfo.<init>(HSLFFontInfo.java:74)
at org.apache.poi.hslf.record.FontCollection.<init>(FontCollection.java:47)
... 41 more
2017-12-05 21:44 GMT-02:00 Allison, Timothy B. <ta...@mitre.org>:
> Reports are here:
>
> http://162.242.228.174/reports/reports_Tika1_16V1_17.zip
>
> I haven't had a chance to look. Tomorrow...
>
> Let me know what you find.
>
> -----Original Message-----
> From: Allison, Timothy B. [mailto:tallison@mitre.org]
> Sent: Wednesday, November 29, 2017 1:08 PM
> To: dev@tika.apache.org
> Subject: RE: Tika 1.17?
>
> +1
>
> -----Original Message-----
> From: Chris Mattmann [mailto:mattmann@apache.org]
> Sent: Wednesday, November 29, 2017 12:57 PM
> To: dev@tika.apache.org
> Subject: Re: Tika 1.17?
>
> Thanks so much for fixing this. It worked during MEMEX and then I think
> has since fallen out of date and perhaps I committed Zarana’s code wrong or
> something. Will be great to get this working!
>
>
>
> On 11/29/17, 9:54 AM, "David Meikle" <lo...@gmail.com> wrote:
>
> I am thinking TIKA-2385. I've got a resized image that I can commit
> tonight
> that should close this one off.
>
> Cheers,
> Dave
>
>
> On 29 Nov 2017 14:42, "Allison, Timothy B." <ta...@mitre.org>
> wrote:
>
> Many thanks to Bob for help on TIKA-2502!
>
> Anything else we want to put into 1.17 before I run the regression
> tests?
>
> -----Original Message-----
> From: Allison, Timothy B. [mailto:tallison@mitre.org]
> Sent: Monday, November 13, 2017 1:42 PM
> To: dev@tika.apache.org
> Subject: RE: Tika 1.17?
>
> Y. You're right. Thank you!
>
> I think I've been avoiding that because there were some regressions in
> metadata-extractor last I looked at this. Let's hope those are gone in
> 2.10.1.
>
> -----Original Message-----
> From: Tyler Bui-Palsulich [mailto:tpalsulich@apache.org]
> Sent: Sunday, November 12, 2017 2:54 PM
> To: dev@tika.apache.org
> Subject: RE: Tika 1.17?
>
> TIKA-2486 might be worth blocking on since there is a CVE.
>
> Tyler
>
> On Nov 6, 2017 5:26 AM, "Allison, Timothy B." <ta...@mitre.org>
> wrote:
>
> > Y. I'm happy enough to wait a few more days. I wasn't able to kick
> > off the regression tests last week. Should I wait for the new
> parsers
> > to run the regression tests?
> >
> > -----Original Message-----
> > From: David Meikle [mailto:loompa@gmail.com]
> > Sent: Friday, November 3, 2017 7:42 PM
> > To: dev@tika.apache.org
> > Subject: Re: Tika 1.17?
> >
> > Sounds good. I have a couple of new parsers I would like to slot in
> > but not had a chance the last few months. Will go for it over the
> > weekend, if that works for you Tim.
> >
> > Cheers,
> > Dave
> >
> >
> >
> > On 3 November 2017 at 15:19, Mattmann, Chris A (3010) <
> > chris.a.mattmann@jpl.nasa.gov> wrote:
> >
> > > Let’s make it so (
> > >
> > > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> > ++++++++++++++
> > > Chris Mattmann, Ph.D.
> > > Principal Data Scientist, Engineering Administrative Office (3010)
> > > Manager, NSF & Open Source Projects Formulation and Development
> > > Offices
> > > (8212)
> > > NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
> > > Office: 180-503E, Mailstop: 180-503
> > > Email: chris.a.mattmann@nasa.gov
> > > WWW: http://sunset.usc.edu/~mattmann/
> > > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> > ++++++++++++++
> > > Director, Information Retrieval and Data Science Group (IRDS)
> > > Adjunct Associate Professor, Computer Science Department University
> > > of Southern California, Los Angeles, CA 90089 USA
> > > WWW: http://irds.usc.edu/
> > > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> > ++++++++++++++
> > >
> > >
> > >
> > > On 11/3/17, 7:35 AM, "Allison, Timothy B." <ta...@mitre.org>
> wrote:
> > >
> > > All,
> > >
> > > PDFBox 2.0.8 is now integrated. I want to fix TIKA-2490 before
> > > we release 1.17. Are there other issues that are blockers or you'd
> > > like to fix before 1.17 (TIKA-2471, maybe?)?
> > >
> > > I plan to run initial large scale regression tests shortly for
> > > rfc822 and mbox because of TIKA-2478. I'll run the full regression
> > > tests before cutting the RC, but I want to focus on those for now.
> Other requests?
> > >
> > > Cheers,
> > >
> > > Tim
> > >
> > >
> > >
> >
>
>
>
>
RE: Tika 1.17?
Posted by "Allison, Timothy B." <ta...@mitre.org>.
Reports are here:
http://162.242.228.174/reports/reports_Tika1_16V1_17.zip
I haven't had a chance to look. Tomorrow...
Let me know what you find.
-----Original Message-----
From: Allison, Timothy B. [mailto:tallison@mitre.org]
Sent: Wednesday, November 29, 2017 1:08 PM
To: dev@tika.apache.org
Subject: RE: Tika 1.17?
+1
-----Original Message-----
From: Chris Mattmann [mailto:mattmann@apache.org]
Sent: Wednesday, November 29, 2017 12:57 PM
To: dev@tika.apache.org
Subject: Re: Tika 1.17?
Thanks so much for fixing this. It worked during MEMEX and then I think has since fallen out of date and perhaps I committed Zarana’s code wrong or something. Will be great to get this working!
On 11/29/17, 9:54 AM, "David Meikle" <lo...@gmail.com> wrote:
I am thinking TIKA-2385. I've got a resized image that I can commit tonight
that should close this one off.
Cheers,
Dave
On 29 Nov 2017 14:42, "Allison, Timothy B." <ta...@mitre.org> wrote:
Many thanks to Bob for help on TIKA-2502!
Anything else we want to put into 1.17 before I run the regression tests?
-----Original Message-----
From: Allison, Timothy B. [mailto:tallison@mitre.org]
Sent: Monday, November 13, 2017 1:42 PM
To: dev@tika.apache.org
Subject: RE: Tika 1.17?
Y. You're right. Thank you!
I think I've been avoiding that because there were some regressions in
metadata-extractor last I looked at this. Let's hope those are gone in
2.10.1.
-----Original Message-----
From: Tyler Bui-Palsulich [mailto:tpalsulich@apache.org]
Sent: Sunday, November 12, 2017 2:54 PM
To: dev@tika.apache.org
Subject: RE: Tika 1.17?
TIKA-2486 might be worth blocking on since there is a CVE.
Tyler
On Nov 6, 2017 5:26 AM, "Allison, Timothy B." <ta...@mitre.org> wrote:
> Y. I'm happy enough to wait a few more days. I wasn't able to kick
> off the regression tests last week. Should I wait for the new parsers
> to run the regression tests?
>
> -----Original Message-----
> From: David Meikle [mailto:loompa@gmail.com]
> Sent: Friday, November 3, 2017 7:42 PM
> To: dev@tika.apache.org
> Subject: Re: Tika 1.17?
>
> Sounds good. I have a couple of new parsers I would like to slot in
> but not had a chance the last few months. Will go for it over the
> weekend, if that works for you Tim.
>
> Cheers,
> Dave
>
>
>
> On 3 November 2017 at 15:19, Mattmann, Chris A (3010) <
> chris.a.mattmann@jpl.nasa.gov> wrote:
>
> > Let’s make it so (
> >
> > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> ++++++++++++++
> > Chris Mattmann, Ph.D.
> > Principal Data Scientist, Engineering Administrative Office (3010)
> > Manager, NSF & Open Source Projects Formulation and Development
> > Offices
> > (8212)
> > NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
> > Office: 180-503E, Mailstop: 180-503
> > Email: chris.a.mattmann@nasa.gov
> > WWW: http://sunset.usc.edu/~mattmann/
> > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> ++++++++++++++
> > Director, Information Retrieval and Data Science Group (IRDS)
> > Adjunct Associate Professor, Computer Science Department University
> > of Southern California, Los Angeles, CA 90089 USA
> > WWW: http://irds.usc.edu/
> > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> ++++++++++++++
> >
> >
> >
> > On 11/3/17, 7:35 AM, "Allison, Timothy B." <ta...@mitre.org> wrote:
> >
> > All,
> >
> > PDFBox 2.0.8 is now integrated. I want to fix TIKA-2490 before
> > we release 1.17. Are there other issues that are blockers or you'd
> > like to fix before 1.17 (TIKA-2471, maybe?)?
> >
> > I plan to run initial large scale regression tests shortly for
> > rfc822 and mbox because of TIKA-2478. I'll run the full regression
> > tests before cutting the RC, but I want to focus on those for now.
Other requests?
> >
> > Cheers,
> >
> > Tim
> >
> >
> >
>
RE: Tika 1.17?
Posted by "Allison, Timothy B." <ta...@mitre.org>.
I kicked off the regression tests towards the end of last week. I'm getting permanent churns while executing some of the sql on this size data. I think we've maxed out H2 for our dataset...or I'm doing something inelegant/ill advised w H2.
I've trimmed out the reports that were causing infinite(ish) hangs, and I'm now getting most of the reports that we care about.
I should have the reports ready by this evening/tomorrow.
-----Original Message-----
From: Allison, Timothy B. [mailto:tallison@mitre.org]
Sent: Wednesday, November 29, 2017 1:08 PM
To: dev@tika.apache.org
Subject: RE: Tika 1.17?
+1
-----Original Message-----
From: Chris Mattmann [mailto:mattmann@apache.org]
Sent: Wednesday, November 29, 2017 12:57 PM
To: dev@tika.apache.org
Subject: Re: Tika 1.17?
Thanks so much for fixing this. It worked during MEMEX and then I think has since fallen out of date and perhaps I committed Zarana’s code wrong or something. Will be great to get this working!
On 11/29/17, 9:54 AM, "David Meikle" <lo...@gmail.com> wrote:
I am thinking TIKA-2385. I've got a resized image that I can commit tonight
that should close this one off.
Cheers,
Dave
On 29 Nov 2017 14:42, "Allison, Timothy B." <ta...@mitre.org> wrote:
Many thanks to Bob for help on TIKA-2502!
Anything else we want to put into 1.17 before I run the regression tests?
-----Original Message-----
From: Allison, Timothy B. [mailto:tallison@mitre.org]
Sent: Monday, November 13, 2017 1:42 PM
To: dev@tika.apache.org
Subject: RE: Tika 1.17?
Y. You're right. Thank you!
I think I've been avoiding that because there were some regressions in
metadata-extractor last I looked at this. Let's hope those are gone in
2.10.1.
-----Original Message-----
From: Tyler Bui-Palsulich [mailto:tpalsulich@apache.org]
Sent: Sunday, November 12, 2017 2:54 PM
To: dev@tika.apache.org
Subject: RE: Tika 1.17?
TIKA-2486 might be worth blocking on since there is a CVE.
Tyler
On Nov 6, 2017 5:26 AM, "Allison, Timothy B." <ta...@mitre.org> wrote:
> Y. I'm happy enough to wait a few more days. I wasn't able to kick
> off the regression tests last week. Should I wait for the new parsers
> to run the regression tests?
>
> -----Original Message-----
> From: David Meikle [mailto:loompa@gmail.com]
> Sent: Friday, November 3, 2017 7:42 PM
> To: dev@tika.apache.org
> Subject: Re: Tika 1.17?
>
> Sounds good. I have a couple of new parsers I would like to slot in
> but not had a chance the last few months. Will go for it over the
> weekend, if that works for you Tim.
>
> Cheers,
> Dave
>
>
>
> On 3 November 2017 at 15:19, Mattmann, Chris A (3010) <
> chris.a.mattmann@jpl.nasa.gov> wrote:
>
> > Let’s make it so (
> >
> > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> ++++++++++++++
> > Chris Mattmann, Ph.D.
> > Principal Data Scientist, Engineering Administrative Office (3010)
> > Manager, NSF & Open Source Projects Formulation and Development
> > Offices
> > (8212)
> > NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
> > Office: 180-503E, Mailstop: 180-503
> > Email: chris.a.mattmann@nasa.gov
> > WWW: http://sunset.usc.edu/~mattmann/
> > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> ++++++++++++++
> > Director, Information Retrieval and Data Science Group (IRDS)
> > Adjunct Associate Professor, Computer Science Department University
> > of Southern California, Los Angeles, CA 90089 USA
> > WWW: http://irds.usc.edu/
> > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> ++++++++++++++
> >
> >
> >
> > On 11/3/17, 7:35 AM, "Allison, Timothy B." <ta...@mitre.org> wrote:
> >
> > All,
> >
> > PDFBox 2.0.8 is now integrated. I want to fix TIKA-2490 before
> > we release 1.17. Are there other issues that are blockers or you'd
> > like to fix before 1.17 (TIKA-2471, maybe?)?
> >
> > I plan to run initial large scale regression tests shortly for
> > rfc822 and mbox because of TIKA-2478. I'll run the full regression
> > tests before cutting the RC, but I want to focus on those for now.
Other requests?
> >
> > Cheers,
> >
> > Tim
> >
> >
> >
>
RE: Tika 1.17?
Posted by "Allison, Timothy B." <ta...@mitre.org>.
+1
-----Original Message-----
From: Chris Mattmann [mailto:mattmann@apache.org]
Sent: Wednesday, November 29, 2017 12:57 PM
To: dev@tika.apache.org
Subject: Re: Tika 1.17?
Thanks so much for fixing this. It worked during MEMEX and then I think has since fallen out of date and perhaps I committed Zarana’s code wrong or something. Will be great to get this working!
On 11/29/17, 9:54 AM, "David Meikle" <lo...@gmail.com> wrote:
I am thinking TIKA-2385. I've got a resized image that I can commit tonight
that should close this one off.
Cheers,
Dave
On 29 Nov 2017 14:42, "Allison, Timothy B." <ta...@mitre.org> wrote:
Many thanks to Bob for help on TIKA-2502!
Anything else we want to put into 1.17 before I run the regression tests?
-----Original Message-----
From: Allison, Timothy B. [mailto:tallison@mitre.org]
Sent: Monday, November 13, 2017 1:42 PM
To: dev@tika.apache.org
Subject: RE: Tika 1.17?
Y. You're right. Thank you!
I think I've been avoiding that because there were some regressions in
metadata-extractor last I looked at this. Let's hope those are gone in
2.10.1.
-----Original Message-----
From: Tyler Bui-Palsulich [mailto:tpalsulich@apache.org]
Sent: Sunday, November 12, 2017 2:54 PM
To: dev@tika.apache.org
Subject: RE: Tika 1.17?
TIKA-2486 might be worth blocking on since there is a CVE.
Tyler
On Nov 6, 2017 5:26 AM, "Allison, Timothy B." <ta...@mitre.org> wrote:
> Y. I'm happy enough to wait a few more days. I wasn't able to kick
> off the regression tests last week. Should I wait for the new parsers
> to run the regression tests?
>
> -----Original Message-----
> From: David Meikle [mailto:loompa@gmail.com]
> Sent: Friday, November 3, 2017 7:42 PM
> To: dev@tika.apache.org
> Subject: Re: Tika 1.17?
>
> Sounds good. I have a couple of new parsers I would like to slot in
> but not had a chance the last few months. Will go for it over the
> weekend, if that works for you Tim.
>
> Cheers,
> Dave
>
>
>
> On 3 November 2017 at 15:19, Mattmann, Chris A (3010) <
> chris.a.mattmann@jpl.nasa.gov> wrote:
>
> > Let’s make it so (
> >
> > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> ++++++++++++++
> > Chris Mattmann, Ph.D.
> > Principal Data Scientist, Engineering Administrative Office (3010)
> > Manager, NSF & Open Source Projects Formulation and Development
> > Offices
> > (8212)
> > NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
> > Office: 180-503E, Mailstop: 180-503
> > Email: chris.a.mattmann@nasa.gov
> > WWW: http://sunset.usc.edu/~mattmann/
> > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> ++++++++++++++
> > Director, Information Retrieval and Data Science Group (IRDS)
> > Adjunct Associate Professor, Computer Science Department University
> > of Southern California, Los Angeles, CA 90089 USA
> > WWW: http://irds.usc.edu/
> > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> ++++++++++++++
> >
> >
> >
> > On 11/3/17, 7:35 AM, "Allison, Timothy B." <ta...@mitre.org> wrote:
> >
> > All,
> >
> > PDFBox 2.0.8 is now integrated. I want to fix TIKA-2490 before
> > we release 1.17. Are there other issues that are blockers or you'd
> > like to fix before 1.17 (TIKA-2471, maybe?)?
> >
> > I plan to run initial large scale regression tests shortly for
> > rfc822 and mbox because of TIKA-2478. I'll run the full regression
> > tests before cutting the RC, but I want to focus on those for now.
Other requests?
> >
> > Cheers,
> >
> > Tim
> >
> >
> >
>
Re: Tika 1.17?
Posted by Chris Mattmann <ma...@apache.org>.
Thanks so much for fixing this. It worked during MEMEX and then I think has since fallen out
of date and perhaps I committed Zarana’s code wrong or something. Will be great to get this
working!
On 11/29/17, 9:54 AM, "David Meikle" <lo...@gmail.com> wrote:
I am thinking TIKA-2385. I've got a resized image that I can commit tonight
that should close this one off.
Cheers,
Dave
On 29 Nov 2017 14:42, "Allison, Timothy B." <ta...@mitre.org> wrote:
Many thanks to Bob for help on TIKA-2502!
Anything else we want to put into 1.17 before I run the regression tests?
-----Original Message-----
From: Allison, Timothy B. [mailto:tallison@mitre.org]
Sent: Monday, November 13, 2017 1:42 PM
To: dev@tika.apache.org
Subject: RE: Tika 1.17?
Y. You're right. Thank you!
I think I've been avoiding that because there were some regressions in
metadata-extractor last I looked at this. Let's hope those are gone in
2.10.1.
-----Original Message-----
From: Tyler Bui-Palsulich [mailto:tpalsulich@apache.org]
Sent: Sunday, November 12, 2017 2:54 PM
To: dev@tika.apache.org
Subject: RE: Tika 1.17?
TIKA-2486 might be worth blocking on since there is a CVE.
Tyler
On Nov 6, 2017 5:26 AM, "Allison, Timothy B." <ta...@mitre.org> wrote:
> Y. I'm happy enough to wait a few more days. I wasn't able to kick
> off the regression tests last week. Should I wait for the new parsers
> to run the regression tests?
>
> -----Original Message-----
> From: David Meikle [mailto:loompa@gmail.com]
> Sent: Friday, November 3, 2017 7:42 PM
> To: dev@tika.apache.org
> Subject: Re: Tika 1.17?
>
> Sounds good. I have a couple of new parsers I would like to slot in
> but not had a chance the last few months. Will go for it over the
> weekend, if that works for you Tim.
>
> Cheers,
> Dave
>
>
>
> On 3 November 2017 at 15:19, Mattmann, Chris A (3010) <
> chris.a.mattmann@jpl.nasa.gov> wrote:
>
> > Let’s make it so (
> >
> > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> ++++++++++++++
> > Chris Mattmann, Ph.D.
> > Principal Data Scientist, Engineering Administrative Office (3010)
> > Manager, NSF & Open Source Projects Formulation and Development
> > Offices
> > (8212)
> > NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
> > Office: 180-503E, Mailstop: 180-503
> > Email: chris.a.mattmann@nasa.gov
> > WWW: http://sunset.usc.edu/~mattmann/
> > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> ++++++++++++++
> > Director, Information Retrieval and Data Science Group (IRDS)
> > Adjunct Associate Professor, Computer Science Department University
> > of Southern California, Los Angeles, CA 90089 USA
> > WWW: http://irds.usc.edu/
> > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> ++++++++++++++
> >
> >
> >
> > On 11/3/17, 7:35 AM, "Allison, Timothy B." <ta...@mitre.org> wrote:
> >
> > All,
> >
> > PDFBox 2.0.8 is now integrated. I want to fix TIKA-2490 before
> > we release 1.17. Are there other issues that are blockers or you'd
> > like to fix before 1.17 (TIKA-2471, maybe?)?
> >
> > I plan to run initial large scale regression tests shortly for
> > rfc822 and mbox because of TIKA-2478. I'll run the full regression
> > tests before cutting the RC, but I want to focus on those for now.
Other requests?
> >
> > Cheers,
> >
> > Tim
> >
> >
> >
>
RE: Tika 1.17?
Posted by David Meikle <lo...@gmail.com>.
I am thinking TIKA-2385. I've got a resized image that I can commit tonight
that should close this one off.
Cheers,
Dave
On 29 Nov 2017 14:42, "Allison, Timothy B." <ta...@mitre.org> wrote:
Many thanks to Bob for help on TIKA-2502!
Anything else we want to put into 1.17 before I run the regression tests?
-----Original Message-----
From: Allison, Timothy B. [mailto:tallison@mitre.org]
Sent: Monday, November 13, 2017 1:42 PM
To: dev@tika.apache.org
Subject: RE: Tika 1.17?
Y. You're right. Thank you!
I think I've been avoiding that because there were some regressions in
metadata-extractor last I looked at this. Let's hope those are gone in
2.10.1.
-----Original Message-----
From: Tyler Bui-Palsulich [mailto:tpalsulich@apache.org]
Sent: Sunday, November 12, 2017 2:54 PM
To: dev@tika.apache.org
Subject: RE: Tika 1.17?
TIKA-2486 might be worth blocking on since there is a CVE.
Tyler
On Nov 6, 2017 5:26 AM, "Allison, Timothy B." <ta...@mitre.org> wrote:
> Y. I'm happy enough to wait a few more days. I wasn't able to kick
> off the regression tests last week. Should I wait for the new parsers
> to run the regression tests?
>
> -----Original Message-----
> From: David Meikle [mailto:loompa@gmail.com]
> Sent: Friday, November 3, 2017 7:42 PM
> To: dev@tika.apache.org
> Subject: Re: Tika 1.17?
>
> Sounds good. I have a couple of new parsers I would like to slot in
> but not had a chance the last few months. Will go for it over the
> weekend, if that works for you Tim.
>
> Cheers,
> Dave
>
>
>
> On 3 November 2017 at 15:19, Mattmann, Chris A (3010) <
> chris.a.mattmann@jpl.nasa.gov> wrote:
>
> > Let’s make it so (
> >
> > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> ++++++++++++++
> > Chris Mattmann, Ph.D.
> > Principal Data Scientist, Engineering Administrative Office (3010)
> > Manager, NSF & Open Source Projects Formulation and Development
> > Offices
> > (8212)
> > NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
> > Office: 180-503E, Mailstop: 180-503
> > Email: chris.a.mattmann@nasa.gov
> > WWW: http://sunset.usc.edu/~mattmann/
> > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> ++++++++++++++
> > Director, Information Retrieval and Data Science Group (IRDS)
> > Adjunct Associate Professor, Computer Science Department University
> > of Southern California, Los Angeles, CA 90089 USA
> > WWW: http://irds.usc.edu/
> > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> ++++++++++++++
> >
> >
> >
> > On 11/3/17, 7:35 AM, "Allison, Timothy B." <ta...@mitre.org> wrote:
> >
> > All,
> >
> > PDFBox 2.0.8 is now integrated. I want to fix TIKA-2490 before
> > we release 1.17. Are there other issues that are blockers or you'd
> > like to fix before 1.17 (TIKA-2471, maybe?)?
> >
> > I plan to run initial large scale regression tests shortly for
> > rfc822 and mbox because of TIKA-2478. I'll run the full regression
> > tests before cutting the RC, but I want to focus on those for now.
Other requests?
> >
> > Cheers,
> >
> > Tim
> >
> >
> >
>
RE: Tika 1.17?
Posted by "Allison, Timothy B." <ta...@mitre.org>.
Many thanks to Bob for help on TIKA-2502!
Anything else we want to put into 1.17 before I run the regression tests?
-----Original Message-----
From: Allison, Timothy B. [mailto:tallison@mitre.org]
Sent: Monday, November 13, 2017 1:42 PM
To: dev@tika.apache.org
Subject: RE: Tika 1.17?
Y. You're right. Thank you!
I think I've been avoiding that because there were some regressions in metadata-extractor last I looked at this. Let's hope those are gone in 2.10.1.
-----Original Message-----
From: Tyler Bui-Palsulich [mailto:tpalsulich@apache.org]
Sent: Sunday, November 12, 2017 2:54 PM
To: dev@tika.apache.org
Subject: RE: Tika 1.17?
TIKA-2486 might be worth blocking on since there is a CVE.
Tyler
On Nov 6, 2017 5:26 AM, "Allison, Timothy B." <ta...@mitre.org> wrote:
> Y. I'm happy enough to wait a few more days. I wasn't able to kick
> off the regression tests last week. Should I wait for the new parsers
> to run the regression tests?
>
> -----Original Message-----
> From: David Meikle [mailto:loompa@gmail.com]
> Sent: Friday, November 3, 2017 7:42 PM
> To: dev@tika.apache.org
> Subject: Re: Tika 1.17?
>
> Sounds good. I have a couple of new parsers I would like to slot in
> but not had a chance the last few months. Will go for it over the
> weekend, if that works for you Tim.
>
> Cheers,
> Dave
>
>
>
> On 3 November 2017 at 15:19, Mattmann, Chris A (3010) <
> chris.a.mattmann@jpl.nasa.gov> wrote:
>
> > Let’s make it so (
> >
> > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> ++++++++++++++
> > Chris Mattmann, Ph.D.
> > Principal Data Scientist, Engineering Administrative Office (3010)
> > Manager, NSF & Open Source Projects Formulation and Development
> > Offices
> > (8212)
> > NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
> > Office: 180-503E, Mailstop: 180-503
> > Email: chris.a.mattmann@nasa.gov
> > WWW: http://sunset.usc.edu/~mattmann/
> > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> ++++++++++++++
> > Director, Information Retrieval and Data Science Group (IRDS)
> > Adjunct Associate Professor, Computer Science Department University
> > of Southern California, Los Angeles, CA 90089 USA
> > WWW: http://irds.usc.edu/
> > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> ++++++++++++++
> >
> >
> >
> > On 11/3/17, 7:35 AM, "Allison, Timothy B." <ta...@mitre.org> wrote:
> >
> > All,
> >
> > PDFBox 2.0.8 is now integrated. I want to fix TIKA-2490 before
> > we release 1.17. Are there other issues that are blockers or you'd
> > like to fix before 1.17 (TIKA-2471, maybe?)?
> >
> > I plan to run initial large scale regression tests shortly for
> > rfc822 and mbox because of TIKA-2478. I'll run the full regression
> > tests before cutting the RC, but I want to focus on those for now. Other requests?
> >
> > Cheers,
> >
> > Tim
> >
> >
> >
>
RE: Tika 1.17?
Posted by Tyler Bui-Palsulich <tp...@apache.org>.
TIKA-2486 might be worth blocking on since there is a CVE.
Tyler
On Nov 6, 2017 5:26 AM, "Allison, Timothy B." <ta...@mitre.org> wrote:
> Y. I'm happy enough to wait a few more days. I wasn't able to kick off
> the regression tests last week. Should I wait for the new parsers to run
> the regression tests?
>
> -----Original Message-----
> From: David Meikle [mailto:loompa@gmail.com]
> Sent: Friday, November 3, 2017 7:42 PM
> To: dev@tika.apache.org
> Subject: Re: Tika 1.17?
>
> Sounds good. I have a couple of new parsers I would like to slot in but not
> had a chance the last few months. Will go for it over the weekend, if that
> works for you Tim.
>
> Cheers,
> Dave
>
>
>
> On 3 November 2017 at 15:19, Mattmann, Chris A (3010) <
> chris.a.mattmann@jpl.nasa.gov> wrote:
>
> > Let’s make it so (
> >
> > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> ++++++++++++++
> > Chris Mattmann, Ph.D.
> > Principal Data Scientist, Engineering Administrative Office (3010)
> > Manager, NSF & Open Source Projects Formulation and Development Offices
> > (8212)
> > NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
> > Office: 180-503E, Mailstop: 180-503
> > Email: chris.a.mattmann@nasa.gov
> > WWW: http://sunset.usc.edu/~mattmann/
> > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> ++++++++++++++
> > Director, Information Retrieval and Data Science Group (IRDS)
> > Adjunct Associate Professor, Computer Science Department
> > University of Southern California, Los Angeles, CA 90089 USA
> > WWW: http://irds.usc.edu/
> > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> ++++++++++++++
> >
> >
> >
> > On 11/3/17, 7:35 AM, "Allison, Timothy B." <ta...@mitre.org> wrote:
> >
> > All,
> >
> > PDFBox 2.0.8 is now integrated. I want to fix TIKA-2490 before we
> > release 1.17. Are there other issues that are blockers or you'd like to
> > fix before 1.17 (TIKA-2471, maybe?)?
> >
> > I plan to run initial large scale regression tests shortly for rfc822
> > and mbox because of TIKA-2478. I'll run the full regression tests before
> > cutting the RC, but I want to focus on those for now. Other requests?
> >
> > Cheers,
> >
> > Tim
> >
> >
> >
>
RE: Tika 1.17?
Posted by "Allison, Timothy B." <ta...@mitre.org>.
Y. I'm happy enough to wait a few more days. I wasn't able to kick off the regression tests last week. Should I wait for the new parsers to run the regression tests?
-----Original Message-----
From: David Meikle [mailto:loompa@gmail.com]
Sent: Friday, November 3, 2017 7:42 PM
To: dev@tika.apache.org
Subject: Re: Tika 1.17?
Sounds good. I have a couple of new parsers I would like to slot in but not
had a chance the last few months. Will go for it over the weekend, if that
works for you Tim.
Cheers,
Dave
On 3 November 2017 at 15:19, Mattmann, Chris A (3010) <
chris.a.mattmann@jpl.nasa.gov> wrote:
> Let’s make it so (
>
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> Chris Mattmann, Ph.D.
> Principal Data Scientist, Engineering Administrative Office (3010)
> Manager, NSF & Open Source Projects Formulation and Development Offices
> (8212)
> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
> Office: 180-503E, Mailstop: 180-503
> Email: chris.a.mattmann@nasa.gov
> WWW: http://sunset.usc.edu/~mattmann/
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> Director, Information Retrieval and Data Science Group (IRDS)
> Adjunct Associate Professor, Computer Science Department
> University of Southern California, Los Angeles, CA 90089 USA
> WWW: http://irds.usc.edu/
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>
>
>
> On 11/3/17, 7:35 AM, "Allison, Timothy B." <ta...@mitre.org> wrote:
>
> All,
>
> PDFBox 2.0.8 is now integrated. I want to fix TIKA-2490 before we
> release 1.17. Are there other issues that are blockers or you'd like to
> fix before 1.17 (TIKA-2471, maybe?)?
>
> I plan to run initial large scale regression tests shortly for rfc822
> and mbox because of TIKA-2478. I'll run the full regression tests before
> cutting the RC, but I want to focus on those for now. Other requests?
>
> Cheers,
>
> Tim
>
>
>
Re: Tika 1.17?
Posted by David Meikle <lo...@gmail.com>.
Sounds good. I have a couple of new parsers I would like to slot in but not
had a chance the last few months. Will go for it over the weekend, if that
works for you Tim.
Cheers,
Dave
On 3 November 2017 at 15:19, Mattmann, Chris A (3010) <
chris.a.mattmann@jpl.nasa.gov> wrote:
> Let’s make it so (
>
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> Chris Mattmann, Ph.D.
> Principal Data Scientist, Engineering Administrative Office (3010)
> Manager, NSF & Open Source Projects Formulation and Development Offices
> (8212)
> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
> Office: 180-503E, Mailstop: 180-503
> Email: chris.a.mattmann@nasa.gov
> WWW: http://sunset.usc.edu/~mattmann/
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> Director, Information Retrieval and Data Science Group (IRDS)
> Adjunct Associate Professor, Computer Science Department
> University of Southern California, Los Angeles, CA 90089 USA
> WWW: http://irds.usc.edu/
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>
>
>
> On 11/3/17, 7:35 AM, "Allison, Timothy B." <ta...@mitre.org> wrote:
>
> All,
>
> PDFBox 2.0.8 is now integrated. I want to fix TIKA-2490 before we
> release 1.17. Are there other issues that are blockers or you'd like to
> fix before 1.17 (TIKA-2471, maybe?)?
>
> I plan to run initial large scale regression tests shortly for rfc822
> and mbox because of TIKA-2478. I'll run the full regression tests before
> cutting the RC, but I want to focus on those for now. Other requests?
>
> Cheers,
>
> Tim
>
>
>
Re: Tika 1.17?
Posted by "Mattmann, Chris A (3010)" <ch...@jpl.nasa.gov>.
Let’s make it so (
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Chris Mattmann, Ph.D.
Principal Data Scientist, Engineering Administrative Office (3010)
Manager, NSF & Open Source Projects Formulation and Development Offices (8212)
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 180-503E, Mailstop: 180-503
Email: chris.a.mattmann@nasa.gov
WWW: http://sunset.usc.edu/~mattmann/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Director, Information Retrieval and Data Science Group (IRDS)
Adjunct Associate Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
WWW: http://irds.usc.edu/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
On 11/3/17, 7:35 AM, "Allison, Timothy B." <ta...@mitre.org> wrote:
All,
PDFBox 2.0.8 is now integrated. I want to fix TIKA-2490 before we release 1.17. Are there other issues that are blockers or you'd like to fix before 1.17 (TIKA-2471, maybe?)?
I plan to run initial large scale regression tests shortly for rfc822 and mbox because of TIKA-2478. I'll run the full regression tests before cutting the RC, but I want to focus on those for now. Other requests?
Cheers,
Tim