You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Allison, Timothy B." <ta...@mitre.org> on 2016/03/02 13:26:23 UTC

trunk build failing in bundle --, cxf class not found for GrobidRESTParser?

Anyone have an idea why trunk is now failing?  I couldn't find any changes between the last successful build and last night's failures that would explain this.


Test set: org.apache.tika.bundle.BundleIT
-------------------------------------------------------------------------------
Tests run: 9, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 21.997 sec <<< FAILURE!
testTikaBundle(org.apache.tika.bundle.BundleIT)  Time elapsed: 2.374 sec  <<< ERROR!
java.lang.ClassNotFoundException: org.apache.cxf.jaxrs.ext.multipart.ContentDisposition not found by org.apache.tika.bundle [17]
	at org.apache.felix.framework.BundleWiringImpl.findClassOrResourceByDelegation(BundleWiringImpl.java:1558)
	at org.apache.felix.framework.BundleWiringImpl.access$400(BundleWiringImpl.java:79)
	at org.apache.felix.framework.BundleWiringImpl$BundleClassLoader.loadClass(BundleWiringImpl.java:1998)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
	at org.apache.tika.parser.journal.GrobidRESTParser.parse(GrobidRESTParser.java:69)
	at org.apache.tika.parser.journal.JournalParser.parse(JournalParser.java:60)
	at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)


-----Original Message-----
From: Hudson (JIRA) [mailto:jira@apache.org] 
Sent: Tuesday, March 01, 2016 9:59 PM
To: dev@tika.apache.org
Subject: [jira] [Commented] (TIKA-1857) Enhance PDFParser to extract text from XFA forms


    [ https://issues.apache.org/jira/browse/TIKA-1857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15174937#comment-15174937 ] 

Hudson commented on TIKA-1857:
------------------------------

UNSTABLE: Integrated in tika-trunk-jdk1.7 #916 (See [https://builds.apache.org/job/tika-trunk-jdk1.7/916/])
TIKA-1857: add basic XFA extraction support via Pascal Essiembre. (tallison: rev dbefe9830b26d05f9ce53503565a069bcc63d7c1)
* tika-parsers/src/test/resources/test-documents/testPDF_XFA_govdocs1_258578.pdf
* tika-parsers/src/test/java/org/apache/tika/parser/pdf/PDFParserTest.java
* tika-parsers/src/main/java/org/apache/tika/parser/pdf/PDFParserConfig.java
* tika-parsers/src/main/java/org/apache/tika/parser/pdf/PDFParser.java
* tika-parsers/src/main/java/org/apache/tika/parser/pdf/PDF2XHTML.java
* tika-parsers/src/main/resources/org/apache/tika/parser/pdf/PDFParser.properties
* tika-parsers/src/main/java/org/apache/tika/parser/pdf/XFAExtractor.java
TIKA-1857: add basic XFA extraction support via Pascal Essiembre. (tallison: rev 7c245fa87507cf0887838001c54c65b79b7e7cbc)
* CHANGES.txt


> Enhance PDFParser to extract text from XFA forms
> ------------------------------------------------
>
>                 Key: TIKA-1857
>                 URL: https://issues.apache.org/jira/browse/TIKA-1857
>             Project: Tika
>          Issue Type: Improvement
>          Components: parser
>            Reporter: Pascal Essiembre
>              Labels: patch
>             Fix For: 1.13
>
>         Attachments: 041617_filled_out.pdf, govdocs1_xfas.zip, xfa_in_govdocs1.txt
>
>
> Extract text from PDF Forms (XFA).  Information about XFA: https://en.wikipedia.org/wiki/XFA



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Re: trunk build failing in bundle --, cxf class not found for GrobidRESTParser?

Posted by "Mattmann, Chris A (3980)" <ch...@jpl.nasa.gov>.
yeah maybe you’re right thanks for fixing it guys

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Chris Mattmann, Ph.D.
Chief Architect
Instrument Software and Science Data Systems Section (398)
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 168-519, Mailstop: 168-527
Email: chris.a.mattmann@nasa.gov
WWW:  http://sunset.usc.edu/~mattmann/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Adjunct Associate Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++





-----Original Message-----
From: "Allison, Timothy B." <ta...@mitre.org>
Reply-To: "dev@tika.apache.org" <de...@tika.apache.org>
Date: Wednesday, March 2, 2016 at 6:30 AM
To: "dev@tika.apache.org" <de...@tika.apache.org>
Subject: RE: trunk build failing in bundle --, cxf class not found for
GrobidRESTParser?

>There's a chance you hadn't merged my breaking commit?
>
>-----Original Message-----
>From: Mattmann, Chris A (3980) [mailto:chris.a.mattmann@jpl.nasa.gov]
>Sent: Wednesday, March 02, 2016 9:27 AM
>To: dev@tika.apache.org
>Subject: Re: trunk build failing in bundle --, cxf class not found for
>GrobidRESTParser?
>
>wow this is super odd. Last thing I committed was NLTK .. and it built
>fine locally I Tested before committing.
>
>++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>Chris Mattmann, Ph.D.
>Chief Architect
>Instrument Software and Science Data Systems Section (398) NASA Jet
>Propulsion Laboratory Pasadena, CA 91109 USA
>Office: 168-519, Mailstop: 168-527
>Email: chris.a.mattmann@nasa.gov
>WWW:  http://sunset.usc.edu/~mattmann/
>++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>Adjunct Associate Professor, Computer Science Department University of
>Southern California, Los Angeles, CA 90089 USA
>++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>
>
>
>
>
>-----Original Message-----
>From: "Allison, Timothy B." <ta...@mitre.org>
>Reply-To: "dev@tika.apache.org" <de...@tika.apache.org>
>Date: Wednesday, March 2, 2016 at 4:26 AM
>To: "dev@tika.apache.org" <de...@tika.apache.org>
>Subject: trunk build failing in bundle --, cxf class not found for
>GrobidRESTParser?
>
>>Anyone have an idea why trunk is now failing?  I couldn't find any
>>changes between the last successful build and last night's failures
>>that would explain this.
>>
>>
>>Test set: org.apache.tika.bundle.BundleIT
>>-----------------------------------------------------------------------
>>---
>>-----
>>Tests run: 9, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 21.997
>>sec <<< FAILURE!
>>testTikaBundle(org.apache.tika.bundle.BundleIT)  Time elapsed: 2.374
>>sec <<< ERROR!
>>java.lang.ClassNotFoundException:
>>org.apache.cxf.jaxrs.ext.multipart.ContentDisposition not found by
>>org.apache.tika.bundle [17]
>>	at
>>org.apache.felix.framework.BundleWiringImpl.findClassOrResourceByDelega
>>tio
>>n(BundleWiringImpl.java:1558)
>>	at
>>org.apache.felix.framework.BundleWiringImpl.access$400(BundleWiringImpl
>>.ja
>>va:79)
>>	at
>>org.apache.felix.framework.BundleWiringImpl$BundleClassLoader.loadClass
>>(Bu
>>ndleWiringImpl.java:1998)
>>	at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
>>	at
>>org.apache.tika.parser.journal.GrobidRESTParser.parse(GrobidRESTParser.
>>jav
>>a:69)
>>	at
>>org.apache.tika.parser.journal.JournalParser.parse(JournalParser.java:60)
>>	at 
>>org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
>>
>>
>>-----Original Message-----
>>From: Hudson (JIRA) [mailto:jira@apache.org]
>>Sent: Tuesday, March 01, 2016 9:59 PM
>>To: dev@tika.apache.org
>>Subject: [jira] [Commented] (TIKA-1857) Enhance PDFParser to extract
>>text from XFA forms
>>
>>
>>    [
>>https://issues.apache.org/jira/browse/TIKA-1857?page=com.atlassian.jira
>>.pl 
>>ugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15174937#c
>>omm
>>ent-15174937 ]
>>
>>Hudson commented on TIKA-1857:
>>------------------------------
>>
>>UNSTABLE: Integrated in tika-trunk-jdk1.7 #916 (See
>>[https://builds.apache.org/job/tika-trunk-jdk1.7/916/])
>>TIKA-1857: add basic XFA extraction support via Pascal Essiembre.
>>(tallison: rev dbefe9830b26d05f9ce53503565a069bcc63d7c1)
>>*
>>tika-parsers/src/test/resources/test-documents/testPDF_XFA_govdocs1_258
>>578
>>.pdf
>>* 
>>tika-parsers/src/test/java/org/apache/tika/parser/pdf/PDFParserTest.jav
>>a
>>*
>>tika-parsers/src/main/java/org/apache/tika/parser/pdf/PDFParserConfig.j
>>ava
>>* tika-parsers/src/main/java/org/apache/tika/parser/pdf/PDFParser.java
>>* tika-parsers/src/main/java/org/apache/tika/parser/pdf/PDF2XHTML.java
>>*
>>tika-parsers/src/main/resources/org/apache/tika/parser/pdf/PDFParser.pr
>>ope
>>rties
>>* 
>>tika-parsers/src/main/java/org/apache/tika/parser/pdf/XFAExtractor.java
>>TIKA-1857: add basic XFA extraction support via Pascal Essiembre.
>>(tallison: rev 7c245fa87507cf0887838001c54c65b79b7e7cbc)
>>* CHANGES.txt
>>
>>
>>> Enhance PDFParser to extract text from XFA forms
>>> ------------------------------------------------
>>>
>>>                 Key: TIKA-1857
>>>                 URL: https://issues.apache.org/jira/browse/TIKA-1857
>>>             Project: Tika
>>>          Issue Type: Improvement
>>>          Components: parser
>>>            Reporter: Pascal Essiembre
>>>              Labels: patch
>>>             Fix For: 1.13
>>>
>>>         Attachments: 041617_filled_out.pdf, govdocs1_xfas.zip,
>>>xfa_in_govdocs1.txt
>>>
>>>
>>> Extract text from PDF Forms (XFA).  Information about XFA:
>>>https://en.wikipedia.org/wiki/XFA
>>
>>
>>
>>--
>>This message was sent by Atlassian JIRA
>>(v6.3.4#6332)
>


RE: trunk build failing in bundle --, cxf class not found for GrobidRESTParser?

Posted by "Allison, Timothy B." <ta...@mitre.org>.
There's a chance you hadn't merged my breaking commit?

-----Original Message-----
From: Mattmann, Chris A (3980) [mailto:chris.a.mattmann@jpl.nasa.gov] 
Sent: Wednesday, March 02, 2016 9:27 AM
To: dev@tika.apache.org
Subject: Re: trunk build failing in bundle --, cxf class not found for GrobidRESTParser?

wow this is super odd. Last thing I committed was NLTK .. and it built fine locally I Tested before committing.

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Chris Mattmann, Ph.D.
Chief Architect
Instrument Software and Science Data Systems Section (398) NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 168-519, Mailstop: 168-527
Email: chris.a.mattmann@nasa.gov
WWW:  http://sunset.usc.edu/~mattmann/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Adjunct Associate Professor, Computer Science Department University of Southern California, Los Angeles, CA 90089 USA
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++





-----Original Message-----
From: "Allison, Timothy B." <ta...@mitre.org>
Reply-To: "dev@tika.apache.org" <de...@tika.apache.org>
Date: Wednesday, March 2, 2016 at 4:26 AM
To: "dev@tika.apache.org" <de...@tika.apache.org>
Subject: trunk build failing in bundle --, cxf class not found for GrobidRESTParser?

>Anyone have an idea why trunk is now failing?  I couldn't find any 
>changes between the last successful build and last night's failures 
>that would explain this.
>
>
>Test set: org.apache.tika.bundle.BundleIT
>-----------------------------------------------------------------------
>---
>-----
>Tests run: 9, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 21.997 
>sec <<< FAILURE!
>testTikaBundle(org.apache.tika.bundle.BundleIT)  Time elapsed: 2.374 
>sec <<< ERROR!
>java.lang.ClassNotFoundException:
>org.apache.cxf.jaxrs.ext.multipart.ContentDisposition not found by 
>org.apache.tika.bundle [17]
>	at
>org.apache.felix.framework.BundleWiringImpl.findClassOrResourceByDelega
>tio
>n(BundleWiringImpl.java:1558)
>	at
>org.apache.felix.framework.BundleWiringImpl.access$400(BundleWiringImpl
>.ja
>va:79)
>	at
>org.apache.felix.framework.BundleWiringImpl$BundleClassLoader.loadClass
>(Bu
>ndleWiringImpl.java:1998)
>	at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
>	at
>org.apache.tika.parser.journal.GrobidRESTParser.parse(GrobidRESTParser.
>jav
>a:69)
>	at
>org.apache.tika.parser.journal.JournalParser.parse(JournalParser.java:60)
>	at 
>org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
>
>
>-----Original Message-----
>From: Hudson (JIRA) [mailto:jira@apache.org]
>Sent: Tuesday, March 01, 2016 9:59 PM
>To: dev@tika.apache.org
>Subject: [jira] [Commented] (TIKA-1857) Enhance PDFParser to extract 
>text from XFA forms
>
>
>    [
>https://issues.apache.org/jira/browse/TIKA-1857?page=com.atlassian.jira
>.pl 
>ugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15174937#c
>omm
>ent-15174937 ]
>
>Hudson commented on TIKA-1857:
>------------------------------
>
>UNSTABLE: Integrated in tika-trunk-jdk1.7 #916 (See
>[https://builds.apache.org/job/tika-trunk-jdk1.7/916/])
>TIKA-1857: add basic XFA extraction support via Pascal Essiembre.
>(tallison: rev dbefe9830b26d05f9ce53503565a069bcc63d7c1)
>*
>tika-parsers/src/test/resources/test-documents/testPDF_XFA_govdocs1_258
>578
>.pdf
>* 
>tika-parsers/src/test/java/org/apache/tika/parser/pdf/PDFParserTest.jav
>a
>*
>tika-parsers/src/main/java/org/apache/tika/parser/pdf/PDFParserConfig.j
>ava
>* tika-parsers/src/main/java/org/apache/tika/parser/pdf/PDFParser.java
>* tika-parsers/src/main/java/org/apache/tika/parser/pdf/PDF2XHTML.java
>*
>tika-parsers/src/main/resources/org/apache/tika/parser/pdf/PDFParser.pr
>ope
>rties
>* 
>tika-parsers/src/main/java/org/apache/tika/parser/pdf/XFAExtractor.java
>TIKA-1857: add basic XFA extraction support via Pascal Essiembre.
>(tallison: rev 7c245fa87507cf0887838001c54c65b79b7e7cbc)
>* CHANGES.txt
>
>
>> Enhance PDFParser to extract text from XFA forms
>> ------------------------------------------------
>>
>>                 Key: TIKA-1857
>>                 URL: https://issues.apache.org/jira/browse/TIKA-1857
>>             Project: Tika
>>          Issue Type: Improvement
>>          Components: parser
>>            Reporter: Pascal Essiembre
>>              Labels: patch
>>             Fix For: 1.13
>>
>>         Attachments: 041617_filled_out.pdf, govdocs1_xfas.zip, 
>>xfa_in_govdocs1.txt
>>
>>
>> Extract text from PDF Forms (XFA).  Information about XFA:
>>https://en.wikipedia.org/wiki/XFA
>
>
>
>--
>This message was sent by Atlassian JIRA
>(v6.3.4#6332)


Re: trunk build failing in bundle --, cxf class not found for GrobidRESTParser?

Posted by "Mattmann, Chris A (3980)" <ch...@jpl.nasa.gov>.
wow this is super odd. Last thing I committed was NLTK .. and it
built fine locally I Tested before committing.

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Chris Mattmann, Ph.D.
Chief Architect
Instrument Software and Science Data Systems Section (398)
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 168-519, Mailstop: 168-527
Email: chris.a.mattmann@nasa.gov
WWW:  http://sunset.usc.edu/~mattmann/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Adjunct Associate Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++





-----Original Message-----
From: "Allison, Timothy B." <ta...@mitre.org>
Reply-To: "dev@tika.apache.org" <de...@tika.apache.org>
Date: Wednesday, March 2, 2016 at 4:26 AM
To: "dev@tika.apache.org" <de...@tika.apache.org>
Subject: trunk build failing in bundle --, cxf class not found for
GrobidRESTParser?

>Anyone have an idea why trunk is now failing?  I couldn't find any
>changes between the last successful build and last night's failures that
>would explain this.
>
>
>Test set: org.apache.tika.bundle.BundleIT
>--------------------------------------------------------------------------
>-----
>Tests run: 9, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 21.997
>sec <<< FAILURE!
>testTikaBundle(org.apache.tika.bundle.BundleIT)  Time elapsed: 2.374 sec
><<< ERROR!
>java.lang.ClassNotFoundException:
>org.apache.cxf.jaxrs.ext.multipart.ContentDisposition not found by
>org.apache.tika.bundle [17]
>	at 
>org.apache.felix.framework.BundleWiringImpl.findClassOrResourceByDelegatio
>n(BundleWiringImpl.java:1558)
>	at 
>org.apache.felix.framework.BundleWiringImpl.access$400(BundleWiringImpl.ja
>va:79)
>	at 
>org.apache.felix.framework.BundleWiringImpl$BundleClassLoader.loadClass(Bu
>ndleWiringImpl.java:1998)
>	at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
>	at 
>org.apache.tika.parser.journal.GrobidRESTParser.parse(GrobidRESTParser.jav
>a:69)
>	at 
>org.apache.tika.parser.journal.JournalParser.parse(JournalParser.java:60)
>	at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
>
>
>-----Original Message-----
>From: Hudson (JIRA) [mailto:jira@apache.org]
>Sent: Tuesday, March 01, 2016 9:59 PM
>To: dev@tika.apache.org
>Subject: [jira] [Commented] (TIKA-1857) Enhance PDFParser to extract text
>from XFA forms
>
>
>    [ 
>https://issues.apache.org/jira/browse/TIKA-1857?page=com.atlassian.jira.pl
>ugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15174937#comm
>ent-15174937 ] 
>
>Hudson commented on TIKA-1857:
>------------------------------
>
>UNSTABLE: Integrated in tika-trunk-jdk1.7 #916 (See
>[https://builds.apache.org/job/tika-trunk-jdk1.7/916/])
>TIKA-1857: add basic XFA extraction support via Pascal Essiembre.
>(tallison: rev dbefe9830b26d05f9ce53503565a069bcc63d7c1)
>* 
>tika-parsers/src/test/resources/test-documents/testPDF_XFA_govdocs1_258578
>.pdf
>* tika-parsers/src/test/java/org/apache/tika/parser/pdf/PDFParserTest.java
>* 
>tika-parsers/src/main/java/org/apache/tika/parser/pdf/PDFParserConfig.java
>* tika-parsers/src/main/java/org/apache/tika/parser/pdf/PDFParser.java
>* tika-parsers/src/main/java/org/apache/tika/parser/pdf/PDF2XHTML.java
>* 
>tika-parsers/src/main/resources/org/apache/tika/parser/pdf/PDFParser.prope
>rties
>* tika-parsers/src/main/java/org/apache/tika/parser/pdf/XFAExtractor.java
>TIKA-1857: add basic XFA extraction support via Pascal Essiembre.
>(tallison: rev 7c245fa87507cf0887838001c54c65b79b7e7cbc)
>* CHANGES.txt
>
>
>> Enhance PDFParser to extract text from XFA forms
>> ------------------------------------------------
>>
>>                 Key: TIKA-1857
>>                 URL: https://issues.apache.org/jira/browse/TIKA-1857
>>             Project: Tika
>>          Issue Type: Improvement
>>          Components: parser
>>            Reporter: Pascal Essiembre
>>              Labels: patch
>>             Fix For: 1.13
>>
>>         Attachments: 041617_filled_out.pdf, govdocs1_xfas.zip,
>>xfa_in_govdocs1.txt
>>
>>
>> Extract text from PDF Forms (XFA).  Information about XFA:
>>https://en.wikipedia.org/wiki/XFA
>
>
>
>--
>This message was sent by Atlassian JIRA
>(v6.3.4#6332)


RE: trunk build failing in bundle --, cxf class not found for GrobidRESTParser?

Posted by "Allison, Timothy B." <ta...@mitre.org>.
Those lines were added 3.5 years ago: http://svn.apache.org/viewvc?view=revision&revision=1369624

-----Original Message-----
From: Bob Paulin [mailto:bob@bobpaulin.com] 
Sent: Wednesday, March 02, 2016 8:47 AM
To: dev@tika.apache.org
Subject: Re: trunk build failing in bundle --, cxf class not found for GrobidRESTParser?

I saw it on the 2.x branch but now that you mention it's also happening in trunk I think I see the issue.  The change to the PDFParser includes adding dependencies in the javax.xml.stream package.  The tika-bundle currently has that package marked optional:

javax.xml.stream;version="[1.0,2)";resolution:=optional,

This means that the bundle will start without this class.  However now it's required for the PDFParser to work so my guess is that the PDFParser is not instantiating correctly and it's dropping into the JournalParser which is also coded to handle PDFs.  The JournalParser suffers a similar fate because org.apache.cxf.jaxrs.ext.multipart is optional on the GrobidRESTParser which gets instantiated in the parse method.

So I tried removing :
javax.xml.stream;version="[1.0,2)";resolution:=optional,
javax.xml.stream.events;version="[1.0,2)";resolution:=optional,
javax.xml.stream.util;version="[1.0,2)";resolution:=optional,
 From the tika-bundle/pom.xml and it worked!  So seeing that javax.xml.stream is provided by the JDK I'm a bit curious what those statements were doing there to begin with.  Anyone know?

- Bob

On 3/2/2016 6:26 AM, Allison, Timothy B. wrote:
> Anyone have an idea why trunk is now failing?  I couldn't find any changes between the last successful build and last night's failures that would explain this.
>
>
> Test set: org.apache.tika.bundle.BundleIT
> ----------------------------------------------------------------------
> --------- Tests run: 9, Failures: 0, Errors: 1, Skipped: 0, Time 
> elapsed: 21.997 sec <<< FAILURE!
> testTikaBundle(org.apache.tika.bundle.BundleIT)  Time elapsed: 2.374 sec  <<< ERROR!
> java.lang.ClassNotFoundException: org.apache.cxf.jaxrs.ext.multipart.ContentDisposition not found by org.apache.tika.bundle [17]
> 	at org.apache.felix.framework.BundleWiringImpl.findClassOrResourceByDelegation(BundleWiringImpl.java:1558)
> 	at org.apache.felix.framework.BundleWiringImpl.access$400(BundleWiringImpl.java:79)
> 	at org.apache.felix.framework.BundleWiringImpl$BundleClassLoader.loadClass(BundleWiringImpl.java:1998)
> 	at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
> 	at org.apache.tika.parser.journal.GrobidRESTParser.parse(GrobidRESTParser.java:69)
> 	at org.apache.tika.parser.journal.JournalParser.parse(JournalParser.java:60)
> 	at 
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
>
>
> -----Original Message-----
> From: Hudson (JIRA) [mailto:jira@apache.org]
> Sent: Tuesday, March 01, 2016 9:59 PM
> To: dev@tika.apache.org
> Subject: [jira] [Commented] (TIKA-1857) Enhance PDFParser to extract 
> text from XFA forms
>
>
>      [ 
> https://issues.apache.org/jira/browse/TIKA-1857?page=com.atlassian.jir
> a.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15174
> 937#comment-15174937 ]
>
> Hudson commented on TIKA-1857:
> ------------------------------
>
> UNSTABLE: Integrated in tika-trunk-jdk1.7 #916 (See 
> [https://builds.apache.org/job/tika-trunk-jdk1.7/916/])
> TIKA-1857: add basic XFA extraction support via Pascal Essiembre. 
> (tallison: rev dbefe9830b26d05f9ce53503565a069bcc63d7c1)
> * 
> tika-parsers/src/test/resources/test-documents/testPDF_XFA_govdocs1_25
> 8578.pdf
> * 
> tika-parsers/src/test/java/org/apache/tika/parser/pdf/PDFParserTest.ja
> va
> * 
> tika-parsers/src/main/java/org/apache/tika/parser/pdf/PDFParserConfig.
> java
> * tika-parsers/src/main/java/org/apache/tika/parser/pdf/PDFParser.java
> * tika-parsers/src/main/java/org/apache/tika/parser/pdf/PDF2XHTML.java
> * 
> tika-parsers/src/main/resources/org/apache/tika/parser/pdf/PDFParser.p
> roperties
> * 
> tika-parsers/src/main/java/org/apache/tika/parser/pdf/XFAExtractor.jav
> a
> TIKA-1857: add basic XFA extraction support via Pascal Essiembre. 
> (tallison: rev 7c245fa87507cf0887838001c54c65b79b7e7cbc)
> * CHANGES.txt
>
>
>> Enhance PDFParser to extract text from XFA forms
>> ------------------------------------------------
>>
>>                  Key: TIKA-1857
>>                  URL: https://issues.apache.org/jira/browse/TIKA-1857
>>              Project: Tika
>>           Issue Type: Improvement
>>           Components: parser
>>             Reporter: Pascal Essiembre
>>               Labels: patch
>>              Fix For: 1.13
>>
>>          Attachments: 041617_filled_out.pdf, govdocs1_xfas.zip, 
>> xfa_in_govdocs1.txt
>>
>>
>> Extract text from PDF Forms (XFA).  Information about XFA: 
>> https://en.wikipedia.org/wiki/XFA
>
>
> --
> This message was sent by Atlassian JIRA
> (v6.3.4#6332)


Re: trunk build failing in bundle --, cxf class not found for GrobidRESTParser?

Posted by Bob Paulin <bo...@bobpaulin.com>.
Also as a follow up... .This means that the JournalParser would have 
never worked in tika-bundle since the org.apache.cxf.jaxrs.ext.multipart 
package is required for the GrobidRESTParser to run.  Is there a reason 
this was not included? I'm guessing cxf-rt-rs-client dependancy maybe 
caused problems with other parsers.

  Now that the parsers are broken out in to projects in the 2.x branch 
we could create bundles for each of them which would allow for the 
JournalParser to have org.apache.cxf.jaxrs.ext.multipart embedded 
without impacting the other parsers.  I've stubbed out what this might 
look like in the 2.x branch under the tika-parsers-bundle folder.  Each 
bundle dependencies embedded and inlined (simlair to tika-bundle).  I've 
also provided tests to make sure it starts and has a service registered 
for each parser.   Thoughts on this approach?  Tracking this in:

https://issues.apache.org/jira/browse/TIKA-1860

- Bob

On 3/2/2016 7:46 AM, Bob Paulin wrote:
> I saw it on the 2.x branch but now that you mention it's also 
> happening in trunk I think I see the issue.  The change to the 
> PDFParser includes adding dependencies in the javax.xml.stream 
> package.  The tika-bundle currently has that package marked optional:
>
> javax.xml.stream;version="[1.0,2)";resolution:=optional,
>
> This means that the bundle will start without this class.  However now 
> it's required for the PDFParser to work so my guess is that the 
> PDFParser is not instantiating correctly and it's dropping into the 
> JournalParser which is also coded to handle PDFs.  The JournalParser 
> suffers a similar fate because org.apache.cxf.jaxrs.ext.multipart is 
> optional on the GrobidRESTParser which gets instantiated in the parse 
> method.
>
> So I tried removing :
> javax.xml.stream;version="[1.0,2)";resolution:=optional,
> javax.xml.stream.events;version="[1.0,2)";resolution:=optional,
> javax.xml.stream.util;version="[1.0,2)";resolution:=optional,
> From the tika-bundle/pom.xml and it worked!  So seeing that 
> javax.xml.stream is provided by the JDK I'm a bit curious what those 
> statements were doing there to begin with.  Anyone know?
>
> - Bob
>
> On 3/2/2016 6:26 AM, Allison, Timothy B. wrote:
>> Anyone have an idea why trunk is now failing?  I couldn't find any 
>> changes between the last successful build and last night's failures 
>> that would explain this.
>>
>>
>> Test set: org.apache.tika.bundle.BundleIT
>> ------------------------------------------------------------------------------- 
>>
>> Tests run: 9, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 
>> 21.997 sec <<< FAILURE!
>> testTikaBundle(org.apache.tika.bundle.BundleIT)  Time elapsed: 2.374 
>> sec  <<< ERROR!
>> java.lang.ClassNotFoundException: 
>> org.apache.cxf.jaxrs.ext.multipart.ContentDisposition not found by 
>> org.apache.tika.bundle [17]
>>     at 
>> org.apache.felix.framework.BundleWiringImpl.findClassOrResourceByDelegation(BundleWiringImpl.java:1558)
>>     at 
>> org.apache.felix.framework.BundleWiringImpl.access$400(BundleWiringImpl.java:79)
>>     at 
>> org.apache.felix.framework.BundleWiringImpl$BundleClassLoader.loadClass(BundleWiringImpl.java:1998)
>>     at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
>>     at 
>> org.apache.tika.parser.journal.GrobidRESTParser.parse(GrobidRESTParser.java:69)
>>     at 
>> org.apache.tika.parser.journal.JournalParser.parse(JournalParser.java:60)
>>     at 
>> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
>>
>>
>> -----Original Message-----
>> From: Hudson (JIRA) [mailto:jira@apache.org]
>> Sent: Tuesday, March 01, 2016 9:59 PM
>> To: dev@tika.apache.org
>> Subject: [jira] [Commented] (TIKA-1857) Enhance PDFParser to extract 
>> text from XFA forms
>>
>>
>>      [ 
>> https://issues.apache.org/jira/browse/TIKA-1857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15174937#comment-15174937 
>> ]
>>
>> Hudson commented on TIKA-1857:
>> ------------------------------
>>
>> UNSTABLE: Integrated in tika-trunk-jdk1.7 #916 (See 
>> [https://builds.apache.org/job/tika-trunk-jdk1.7/916/])
>> TIKA-1857: add basic XFA extraction support via Pascal Essiembre. 
>> (tallison: rev dbefe9830b26d05f9ce53503565a069bcc63d7c1)
>> * 
>> tika-parsers/src/test/resources/test-documents/testPDF_XFA_govdocs1_258578.pdf
>> * 
>> tika-parsers/src/test/java/org/apache/tika/parser/pdf/PDFParserTest.java
>> * 
>> tika-parsers/src/main/java/org/apache/tika/parser/pdf/PDFParserConfig.java
>> * tika-parsers/src/main/java/org/apache/tika/parser/pdf/PDFParser.java
>> * tika-parsers/src/main/java/org/apache/tika/parser/pdf/PDF2XHTML.java
>> * 
>> tika-parsers/src/main/resources/org/apache/tika/parser/pdf/PDFParser.properties
>> * 
>> tika-parsers/src/main/java/org/apache/tika/parser/pdf/XFAExtractor.java
>> TIKA-1857: add basic XFA extraction support via Pascal Essiembre. 
>> (tallison: rev 7c245fa87507cf0887838001c54c65b79b7e7cbc)
>> * CHANGES.txt
>>
>>
>>> Enhance PDFParser to extract text from XFA forms
>>> ------------------------------------------------
>>>
>>>                  Key: TIKA-1857
>>>                  URL: https://issues.apache.org/jira/browse/TIKA-1857
>>>              Project: Tika
>>>           Issue Type: Improvement
>>>           Components: parser
>>>             Reporter: Pascal Essiembre
>>>               Labels: patch
>>>              Fix For: 1.13
>>>
>>>          Attachments: 041617_filled_out.pdf, govdocs1_xfas.zip, 
>>> xfa_in_govdocs1.txt
>>>
>>>
>>> Extract text from PDF Forms (XFA).  Information about XFA: 
>>> https://en.wikipedia.org/wiki/XFA
>>
>>
>> -- 
>> This message was sent by Atlassian JIRA
>> (v6.3.4#6332)
>


RE: trunk build failing in bundle --, cxf class not found for GrobidRESTParser?

Posted by "Allison, Timothy B." <ta...@mitre.org>.
So it was my fault...argh...unintended consequences...  Thank you!

-----Original Message-----
From: Bob Paulin [mailto:bob@bobpaulin.com] 
Sent: Wednesday, March 02, 2016 8:47 AM
To: dev@tika.apache.org
Subject: Re: trunk build failing in bundle --, cxf class not found for GrobidRESTParser?

I saw it on the 2.x branch but now that you mention it's also happening in trunk I think I see the issue.  The change to the PDFParser includes adding dependencies in the javax.xml.stream package.  The tika-bundle currently has that package marked optional:

javax.xml.stream;version="[1.0,2)";resolution:=optional,

This means that the bundle will start without this class.  However now it's required for the PDFParser to work so my guess is that the PDFParser is not instantiating correctly and it's dropping into the JournalParser which is also coded to handle PDFs.  The JournalParser suffers a similar fate because org.apache.cxf.jaxrs.ext.multipart is optional on the GrobidRESTParser which gets instantiated in the parse method.

So I tried removing :
javax.xml.stream;version="[1.0,2)";resolution:=optional,
javax.xml.stream.events;version="[1.0,2)";resolution:=optional,
javax.xml.stream.util;version="[1.0,2)";resolution:=optional,
 From the tika-bundle/pom.xml and it worked!  So seeing that javax.xml.stream is provided by the JDK I'm a bit curious what those statements were doing there to begin with.  Anyone know?

- Bob

On 3/2/2016 6:26 AM, Allison, Timothy B. wrote:
> Anyone have an idea why trunk is now failing?  I couldn't find any changes between the last successful build and last night's failures that would explain this.
>
>
> Test set: org.apache.tika.bundle.BundleIT
> ----------------------------------------------------------------------
> --------- Tests run: 9, Failures: 0, Errors: 1, Skipped: 0, Time 
> elapsed: 21.997 sec <<< FAILURE!
> testTikaBundle(org.apache.tika.bundle.BundleIT)  Time elapsed: 2.374 sec  <<< ERROR!
> java.lang.ClassNotFoundException: org.apache.cxf.jaxrs.ext.multipart.ContentDisposition not found by org.apache.tika.bundle [17]
> 	at org.apache.felix.framework.BundleWiringImpl.findClassOrResourceByDelegation(BundleWiringImpl.java:1558)
> 	at org.apache.felix.framework.BundleWiringImpl.access$400(BundleWiringImpl.java:79)
> 	at org.apache.felix.framework.BundleWiringImpl$BundleClassLoader.loadClass(BundleWiringImpl.java:1998)
> 	at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
> 	at org.apache.tika.parser.journal.GrobidRESTParser.parse(GrobidRESTParser.java:69)
> 	at org.apache.tika.parser.journal.JournalParser.parse(JournalParser.java:60)
> 	at 
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
>
>
> -----Original Message-----
> From: Hudson (JIRA) [mailto:jira@apache.org]
> Sent: Tuesday, March 01, 2016 9:59 PM
> To: dev@tika.apache.org
> Subject: [jira] [Commented] (TIKA-1857) Enhance PDFParser to extract 
> text from XFA forms
>
>
>      [ 
> https://issues.apache.org/jira/browse/TIKA-1857?page=com.atlassian.jir
> a.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15174
> 937#comment-15174937 ]
>
> Hudson commented on TIKA-1857:
> ------------------------------
>
> UNSTABLE: Integrated in tika-trunk-jdk1.7 #916 (See 
> [https://builds.apache.org/job/tika-trunk-jdk1.7/916/])
> TIKA-1857: add basic XFA extraction support via Pascal Essiembre. 
> (tallison: rev dbefe9830b26d05f9ce53503565a069bcc63d7c1)
> * 
> tika-parsers/src/test/resources/test-documents/testPDF_XFA_govdocs1_25
> 8578.pdf
> * 
> tika-parsers/src/test/java/org/apache/tika/parser/pdf/PDFParserTest.ja
> va
> * 
> tika-parsers/src/main/java/org/apache/tika/parser/pdf/PDFParserConfig.
> java
> * tika-parsers/src/main/java/org/apache/tika/parser/pdf/PDFParser.java
> * tika-parsers/src/main/java/org/apache/tika/parser/pdf/PDF2XHTML.java
> * 
> tika-parsers/src/main/resources/org/apache/tika/parser/pdf/PDFParser.p
> roperties
> * 
> tika-parsers/src/main/java/org/apache/tika/parser/pdf/XFAExtractor.jav
> a
> TIKA-1857: add basic XFA extraction support via Pascal Essiembre. 
> (tallison: rev 7c245fa87507cf0887838001c54c65b79b7e7cbc)
> * CHANGES.txt
>
>
>> Enhance PDFParser to extract text from XFA forms
>> ------------------------------------------------
>>
>>                  Key: TIKA-1857
>>                  URL: https://issues.apache.org/jira/browse/TIKA-1857
>>              Project: Tika
>>           Issue Type: Improvement
>>           Components: parser
>>             Reporter: Pascal Essiembre
>>               Labels: patch
>>              Fix For: 1.13
>>
>>          Attachments: 041617_filled_out.pdf, govdocs1_xfas.zip, 
>> xfa_in_govdocs1.txt
>>
>>
>> Extract text from PDF Forms (XFA).  Information about XFA: 
>> https://en.wikipedia.org/wiki/XFA
>
>
> --
> This message was sent by Atlassian JIRA
> (v6.3.4#6332)


Re: trunk build failing in bundle --, cxf class not found for GrobidRESTParser?

Posted by Bob Paulin <bo...@bobpaulin.com>.
I saw it on the 2.x branch but now that you mention it's also happening 
in trunk I think I see the issue.  The change to the PDFParser includes 
adding dependencies in the javax.xml.stream package.  The tika-bundle 
currently has that package marked optional:

javax.xml.stream;version="[1.0,2)";resolution:=optional,

This means that the bundle will start without this class.  However now 
it's required for the PDFParser to work so my guess is that the 
PDFParser is not instantiating correctly and it's dropping into the 
JournalParser which is also coded to handle PDFs.  The JournalParser 
suffers a similar fate because org.apache.cxf.jaxrs.ext.multipart is 
optional on the GrobidRESTParser which gets instantiated in the parse 
method.

So I tried removing :
javax.xml.stream;version="[1.0,2)";resolution:=optional,
javax.xml.stream.events;version="[1.0,2)";resolution:=optional,
javax.xml.stream.util;version="[1.0,2)";resolution:=optional,
 From the tika-bundle/pom.xml and it worked!  So seeing that 
javax.xml.stream is provided by the JDK I'm a bit curious what those 
statements were doing there to begin with.  Anyone know?

- Bob

On 3/2/2016 6:26 AM, Allison, Timothy B. wrote:
> Anyone have an idea why trunk is now failing?  I couldn't find any changes between the last successful build and last night's failures that would explain this.
>
>
> Test set: org.apache.tika.bundle.BundleIT
> -------------------------------------------------------------------------------
> Tests run: 9, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 21.997 sec <<< FAILURE!
> testTikaBundle(org.apache.tika.bundle.BundleIT)  Time elapsed: 2.374 sec  <<< ERROR!
> java.lang.ClassNotFoundException: org.apache.cxf.jaxrs.ext.multipart.ContentDisposition not found by org.apache.tika.bundle [17]
> 	at org.apache.felix.framework.BundleWiringImpl.findClassOrResourceByDelegation(BundleWiringImpl.java:1558)
> 	at org.apache.felix.framework.BundleWiringImpl.access$400(BundleWiringImpl.java:79)
> 	at org.apache.felix.framework.BundleWiringImpl$BundleClassLoader.loadClass(BundleWiringImpl.java:1998)
> 	at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
> 	at org.apache.tika.parser.journal.GrobidRESTParser.parse(GrobidRESTParser.java:69)
> 	at org.apache.tika.parser.journal.JournalParser.parse(JournalParser.java:60)
> 	at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
>
>
> -----Original Message-----
> From: Hudson (JIRA) [mailto:jira@apache.org]
> Sent: Tuesday, March 01, 2016 9:59 PM
> To: dev@tika.apache.org
> Subject: [jira] [Commented] (TIKA-1857) Enhance PDFParser to extract text from XFA forms
>
>
>      [ https://issues.apache.org/jira/browse/TIKA-1857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15174937#comment-15174937 ]
>
> Hudson commented on TIKA-1857:
> ------------------------------
>
> UNSTABLE: Integrated in tika-trunk-jdk1.7 #916 (See [https://builds.apache.org/job/tika-trunk-jdk1.7/916/])
> TIKA-1857: add basic XFA extraction support via Pascal Essiembre. (tallison: rev dbefe9830b26d05f9ce53503565a069bcc63d7c1)
> * tika-parsers/src/test/resources/test-documents/testPDF_XFA_govdocs1_258578.pdf
> * tika-parsers/src/test/java/org/apache/tika/parser/pdf/PDFParserTest.java
> * tika-parsers/src/main/java/org/apache/tika/parser/pdf/PDFParserConfig.java
> * tika-parsers/src/main/java/org/apache/tika/parser/pdf/PDFParser.java
> * tika-parsers/src/main/java/org/apache/tika/parser/pdf/PDF2XHTML.java
> * tika-parsers/src/main/resources/org/apache/tika/parser/pdf/PDFParser.properties
> * tika-parsers/src/main/java/org/apache/tika/parser/pdf/XFAExtractor.java
> TIKA-1857: add basic XFA extraction support via Pascal Essiembre. (tallison: rev 7c245fa87507cf0887838001c54c65b79b7e7cbc)
> * CHANGES.txt
>
>
>> Enhance PDFParser to extract text from XFA forms
>> ------------------------------------------------
>>
>>                  Key: TIKA-1857
>>                  URL: https://issues.apache.org/jira/browse/TIKA-1857
>>              Project: Tika
>>           Issue Type: Improvement
>>           Components: parser
>>             Reporter: Pascal Essiembre
>>               Labels: patch
>>              Fix For: 1.13
>>
>>          Attachments: 041617_filled_out.pdf, govdocs1_xfas.zip, xfa_in_govdocs1.txt
>>
>>
>> Extract text from PDF Forms (XFA).  Information about XFA: https://en.wikipedia.org/wiki/XFA
>
>
> --
> This message was sent by Atlassian JIRA
> (v6.3.4#6332)