You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by Pei Chen <ch...@apache.org> on 2016/02/24 23:15:25 UTC
PDFParser in-process mode
Hi tika-dev,
Does the default pdf parser using auto detect parser require to tika
to run in server mode? It seems to try and open an http connection to
localhost:8080 by default? Can it run in-process?
...<snip>
FileInputStream stream = new FileInputStream("src/test/resources/somepdf.pdf");
//works fine in-process with other doc types.
Tika tika = new Tika();
tika.parseToString(stream);
...<snip>
24 Feb 2016 17:06:24 WARN PhaseInterceptorChain - Interceptor for
{http://localhost:8080/processHeaderDocument}WebClient has thrown
exception, unwinding now
org.apache.cxf.interceptor.Fault: No message body writer has been
found for class org.apache.cxf.jaxrs.ext.multipart.MultipartBody,
ContentType: multipart/form-data
at org.apache.cxf.jaxrs.client.WebClient$BodyWriter.doWriteBody(WebClient.java:1220)
at org.apache.cxf.jaxrs.client.AbstractClient$AbstractBodyWriter.handleMessage(AbstractClient.java:1044)
at org.apache.cxf.phase.PhaseInterceptorChain.doIntercept(PhaseInterceptorChain.java:307)
at org.apache.cxf.jaxrs.client.AbstractClient.doRunInterceptorChain(AbstractClient.java:623)
at org.apache.cxf.jaxrs.client.WebClient.doChainedInvocation(WebClient.java:1084)
at org.apache.cxf.jaxrs.client.WebClient.doInvoke(WebClient.java:883)
at org.apache.cxf.jaxrs.client.WebClient.doInvoke(WebClient.java:854)
at org.apache.cxf.jaxrs.client.WebClient.invoke(WebClient.java:320)
at org.apache.cxf.jaxrs.client.WebClient.post(WebClient.java:329)
at org.apache.tika.parser.journal.GrobidRESTParser.parse(GrobidRESTParser.java:74)
at org.apache.tika.parser.journal.JournalParser.parse(JournalParser.java:60)
at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120)
at org.apache.tika.Tika.parseToString(Tika.java:496)
at org.apache.tika.Tika.parseToString(Tika.java:571)
Re: PDFParser in-process mode
Posted by Nick Burch <ap...@gagravarr.org>.
On Wed, 24 Feb 2016, Pei Chen wrote:
> Does the default pdf parser using auto detect parser require to tika
> to run in server mode?
No
> It seems to try and open an http connection to localhost:8080 by
> default? Can it run in-process?
The stacktrace shows you're not using the PDF parser:
> at org.apache.tika.parser.journal.GrobidRESTParser.parse(GrobidRESTParser.java:74)
> at org.apache.tika.parser.journal.JournalParser.parse(JournalParser.java:60)
See https://wiki.apache.org/tika/GrobidJournalParser for how to configure
the grobid parser if you want to use it
Nick
Re: PDFParser in-process mode
Posted by Pei Chen <ch...@apache.org>.
Thanks Nick.
Just a copy and paste error in the email.
I was able to figure out how to bypass the JornalParser and just use PDF ones.
--Pei
On Wed, 24 Feb 2016, Pei Chen wrote:
> Does the default pdf parser using auto detect parser require to tika
> to run in server mode?
No
> It seems to try and open an http connection to localhost:8080 by
> default? Can it run in-process?
The stacktrace shows you're not using the PDF parser:
> at org.apache.tika.parser.journal.GrobidRESTParser.parse(GrobidRESTParser.java:74)
> at org.apache.tika.parser.journal.JournalParser.parse(JournalParser.java:60)
See https://wiki.apache.org/tika/GrobidJournalParser for how to configure
the grobid parser if you want to use it
Nick
On Wed, Feb 24, 2016 at 5:15 PM, Pei Chen <ch...@apache.org> wrote:
> Hi tika-dev,
> Does the default pdf parser using auto detect parser require to tika
> to run in server mode? It seems to try and open an http connection to
> localhost:8080 by default? Can it run in-process?
>
>
> ...<snip>
> FileInputStream stream = new FileInputStream("src/test/resources/somepdf.pdf");
> //works fine in-process with other doc types.
> Tika tika = new Tika();
> tika.parseToString(stream);
> ...<snip>
>
>
> 24 Feb 2016 17:06:24 WARN PhaseInterceptorChain - Interceptor for
> {http://localhost:8080/processHeaderDocument}WebClient has thrown
> exception, unwinding now
>
> org.apache.cxf.interceptor.Fault: No message body writer has been
> found for class org.apache.cxf.jaxrs.ext.multipart.MultipartBody,
> ContentType: multipart/form-data
>
> at org.apache.cxf.jaxrs.client.WebClient$BodyWriter.doWriteBody(WebClient.java:1220)
>
> at org.apache.cxf.jaxrs.client.AbstractClient$AbstractBodyWriter.handleMessage(AbstractClient.java:1044)
>
> at org.apache.cxf.phase.PhaseInterceptorChain.doIntercept(PhaseInterceptorChain.java:307)
>
> at org.apache.cxf.jaxrs.client.AbstractClient.doRunInterceptorChain(AbstractClient.java:623)
>
> at org.apache.cxf.jaxrs.client.WebClient.doChainedInvocation(WebClient.java:1084)
>
> at org.apache.cxf.jaxrs.client.WebClient.doInvoke(WebClient.java:883)
>
> at org.apache.cxf.jaxrs.client.WebClient.doInvoke(WebClient.java:854)
>
> at org.apache.cxf.jaxrs.client.WebClient.invoke(WebClient.java:320)
>
> at org.apache.cxf.jaxrs.client.WebClient.post(WebClient.java:329)
>
> at org.apache.tika.parser.journal.GrobidRESTParser.parse(GrobidRESTParser.java:74)
>
> at org.apache.tika.parser.journal.JournalParser.parse(JournalParser.java:60)
>
> at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
>
> at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
>
> at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120)
>
> at org.apache.tika.Tika.parseToString(Tika.java:496)
>
> at org.apache.tika.Tika.parseToString(Tika.java:571)