You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@tika.apache.org by Mark Kerzner <ma...@shmsoft.com> on 2012/06/19 03:12:25 UTC
Where to start with ClassCastException: org.apache.pdfbox.cos.COSFloat
?
Hi,
I get this exception (see stack trace below), even though I am seemingly
catching it in my code, which says this
public void parse(String fileName, Metadata metadata) {
TikaInputStream inputStream = null;
try {
// the given input stream is closed by the parseToString method
(see Tika documentation)
// we will close it just in case :)
inputStream = TikaInputStream.get(new File(fileName));
String text = tika.parseToString(inputStream,
metadata); // --------_ exception happens here
metadata.set(DocumentMetadataKeys.DOCUMENT_TEXT, text);
} catch (Exception e) {
// the show must still go on
History.appendToHistory("Exception: " + e.getMessage());
metadata.set(DocumentMetadataKeys.PROCESSING_EXCEPTION,
e.getMessage());
} catch (OutOfMemoryError m) {
History.appendToHistory("Memory Exception: " + m.getMessage());
metadata.set(DocumentMetadataKeys.PROCESSING_EXCEPTION,
m.getMessage());
} finally {
if (inputStream != null) {
try {
inputStream.close();
} catch (Exception e) {
e.printStackTrace(System.out);
}
}
}
}
2012-06-19 00:47:06,425 WARN org.apache.pdfbox.util.PDFStreamEngine:
java.lang.ClassCastException: org.apache.pdfbox.cos.COSFloat cannot be cast
to org.apache.pdfbox.cos.COSName
java.lang.ClassCastException: org.apache.pdfbox.cos.COSFloat cannot be cast
to org.apache.pdfbox.cos.COSName
at
org.apache.pdfbox.util.operator.SetGraphicsStateParameters.process(SetGraphicsStateParameters.java:48)
at
org.apache.pdfbox.util.PDFStreamEngine.processOperator(PDFStreamEngine.java:551)
at
org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:274)
at
org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:251)
at
org.apache.pdfbox.util.PDFStreamEngine.processStream(PDFStreamEngine.java:225)
at
org.apache.pdfbox.util.PDFTextStripper.processPage(PDFTextStripper.java:442)
at
org.apache.pdfbox.util.PDFTextStripper.processPages(PDFTextStripper.java:366)
at
org.apache.pdfbox.util.PDFTextStripper.writeText(PDFTextStripper.java:322)
at org.apache.tika.parser.pdf.PDF2XHTML.process(PDF2XHTML.java:61)
at org.apache.tika.parser.pdf.PDFParser.parse(PDFParser.java:96)
at
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
at
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
at
org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120)
at
org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:136)
at
org.apache.tika.parser.mail.MailContentHandler.body(MailContentHandler.java:82)
at
org.apache.james.mime4j.parser.MimeStreamParser.parse(MimeStreamParser.java:133)
at org.apache.tika.parser.mail.RFC822Parser.parse(RFC822Parser.java:76)
at
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
at
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
at
org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120)
at org.apache.tika.Tika.parseToString(Tika.java:380)
at org.freeeed.main.DocumentParser.parse(DocumentParser.java:33)
That's testing on Enron data set
Re: Where to start with ClassCastException: org.apache.pdfbox.cos.COSFloat
?
Posted by Mark Kerzner <ma...@shmsoft.com>.
Thank you, Jon, I used 1.1 instead of 1.0 now, let's see what it will give
me
<dependency>
<groupId>org.apache.tika</groupId>
<artifactId>tika-core</artifactId>
<version>1.1</version>
</dependency>
<dependency>
<groupId>org.apache.tika</groupId>
<artifactId>tika-parsers</artifactId>
<version>1.1</version>
</dependency>
Sincerely,
Mark
On Mon, Jun 18, 2012 at 8:49 PM, Jon Gorrono <jp...@ucdavis.edu> wrote:
> Could be this...
> https://issues.apache.org/jira/browse/PDFBOX-455
>
> ... but the stacktrace differs a bit
>
> the above issue is flagged as fixed-for version of 0.8.0-incubator
>
> looks like tika-parsers v0.5 referenced a prior version (0.7.3)
>
>
> On Mon, Jun 18, 2012 at 6:12 PM, Mark Kerzner <ma...@shmsoft.com>
> wrote:
> > Hi,
> >
> > I get this exception (see stack trace below), even though I am seemingly
> > catching it in my code, which says this
> >
> > public void parse(String fileName, Metadata metadata) {
> > TikaInputStream inputStream = null;
> > try {
> > // the given input stream is closed by the parseToString
> method
> > (see Tika documentation)
> > // we will close it just in case :)
> > inputStream = TikaInputStream.get(new File(fileName));
> > String text = tika.parseToString(inputStream,
> > metadata); // --------_ exception happens here
> > metadata.set(DocumentMetadataKeys.DOCUMENT_TEXT, text);
> > } catch (Exception e) {
> > // the show must still go on
> > History.appendToHistory("Exception: " + e.getMessage());
> > metadata.set(DocumentMetadataKeys.PROCESSING_EXCEPTION,
> > e.getMessage());
> > } catch (OutOfMemoryError m) {
> > History.appendToHistory("Memory Exception: " +
> m.getMessage());
> > metadata.set(DocumentMetadataKeys.PROCESSING_EXCEPTION,
> > m.getMessage());
> > } finally {
> > if (inputStream != null) {
> > try {
> > inputStream.close();
> > } catch (Exception e) {
> > e.printStackTrace(System.out);
> > }
> > }
> > }
> > }
> >
> >
> > 2012-06-19 00:47:06,425 WARN org.apache.pdfbox.util.PDFStreamEngine:
> > java.lang.ClassCastException: org.apache.pdfbox.cos.COSFloat cannot be
> cast
> > to org.apache.pdfbox.cos.COSName
> > java.lang.ClassCastException: org.apache.pdfbox.cos.COSFloat cannot be
> cast
> > to org.apache.pdfbox.cos.COSName
> > at
> >
> org.apache.pdfbox.util.operator.SetGraphicsStateParameters.process(SetGraphicsStateParameters.java:48)
> > at
> >
> org.apache.pdfbox.util.PDFStreamEngine.processOperator(PDFStreamEngine.java:551)
> > at
> >
> org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:274)
> > at
> >
> org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:251)
> > at
> >
> org.apache.pdfbox.util.PDFStreamEngine.processStream(PDFStreamEngine.java:225)
> > at
> >
> org.apache.pdfbox.util.PDFTextStripper.processPage(PDFTextStripper.java:442)
> > at
> >
> org.apache.pdfbox.util.PDFTextStripper.processPages(PDFTextStripper.java:366)
> > at
> >
> org.apache.pdfbox.util.PDFTextStripper.writeText(PDFTextStripper.java:322)
> > at org.apache.tika.parser.pdf.PDF2XHTML.process(PDF2XHTML.java:61)
> > at org.apache.tika.parser.pdf.PDFParser.parse(PDFParser.java:96)
> > at
> > org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
> > at
> > org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
> > at
> > org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120)
> > at
> > org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:136)
> > at
> >
> org.apache.tika.parser.mail.MailContentHandler.body(MailContentHandler.java:82)
> > at
> >
> org.apache.james.mime4j.parser.MimeStreamParser.parse(MimeStreamParser.java:133)
> > at
> org.apache.tika.parser.mail.RFC822Parser.parse(RFC822Parser.java:76)
> > at
> > org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
> > at
> > org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
> > at
> > org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120)
> > at org.apache.tika.Tika.parseToString(Tika.java:380)
> > at org.freeeed.main.DocumentParser.parse(DocumentParser.java:33)
> >
> >
> >
> > That's testing on Enron data set
>
>
>
> --
> Jon Gorrono
> PGP Key: 0x5434509D -
> http{pgp.mit.edu:11371/pks/lookup?search=0x5434509D&op=index}
> GSWoT Introducer - {GSWoT:US75 5434509D Jon P. Gorrono <jpgorrono -
> www.gswot.org>}
> http{middleware.ucdavis.edu}
>
Re: Where to start with ClassCastException: org.apache.pdfbox.cos.COSFloat
?
Posted by Jon Gorrono <jp...@ucdavis.edu>.
Could be this...
https://issues.apache.org/jira/browse/PDFBOX-455
... but the stacktrace differs a bit
the above issue is flagged as fixed-for version of 0.8.0-incubator
looks like tika-parsers v0.5 referenced a prior version (0.7.3)
On Mon, Jun 18, 2012 at 6:12 PM, Mark Kerzner <ma...@shmsoft.com> wrote:
> Hi,
>
> I get this exception (see stack trace below), even though I am seemingly
> catching it in my code, which says this
>
> public void parse(String fileName, Metadata metadata) {
> TikaInputStream inputStream = null;
> try {
> // the given input stream is closed by the parseToString method
> (see Tika documentation)
> // we will close it just in case :)
> inputStream = TikaInputStream.get(new File(fileName));
> String text = tika.parseToString(inputStream,
> metadata); // --------_ exception happens here
> metadata.set(DocumentMetadataKeys.DOCUMENT_TEXT, text);
> } catch (Exception e) {
> // the show must still go on
> History.appendToHistory("Exception: " + e.getMessage());
> metadata.set(DocumentMetadataKeys.PROCESSING_EXCEPTION,
> e.getMessage());
> } catch (OutOfMemoryError m) {
> History.appendToHistory("Memory Exception: " + m.getMessage());
> metadata.set(DocumentMetadataKeys.PROCESSING_EXCEPTION,
> m.getMessage());
> } finally {
> if (inputStream != null) {
> try {
> inputStream.close();
> } catch (Exception e) {
> e.printStackTrace(System.out);
> }
> }
> }
> }
>
>
> 2012-06-19 00:47:06,425 WARN org.apache.pdfbox.util.PDFStreamEngine:
> java.lang.ClassCastException: org.apache.pdfbox.cos.COSFloat cannot be cast
> to org.apache.pdfbox.cos.COSName
> java.lang.ClassCastException: org.apache.pdfbox.cos.COSFloat cannot be cast
> to org.apache.pdfbox.cos.COSName
> at
> org.apache.pdfbox.util.operator.SetGraphicsStateParameters.process(SetGraphicsStateParameters.java:48)
> at
> org.apache.pdfbox.util.PDFStreamEngine.processOperator(PDFStreamEngine.java:551)
> at
> org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:274)
> at
> org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:251)
> at
> org.apache.pdfbox.util.PDFStreamEngine.processStream(PDFStreamEngine.java:225)
> at
> org.apache.pdfbox.util.PDFTextStripper.processPage(PDFTextStripper.java:442)
> at
> org.apache.pdfbox.util.PDFTextStripper.processPages(PDFTextStripper.java:366)
> at
> org.apache.pdfbox.util.PDFTextStripper.writeText(PDFTextStripper.java:322)
> at org.apache.tika.parser.pdf.PDF2XHTML.process(PDF2XHTML.java:61)
> at org.apache.tika.parser.pdf.PDFParser.parse(PDFParser.java:96)
> at
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
> at
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
> at
> org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120)
> at
> org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:136)
> at
> org.apache.tika.parser.mail.MailContentHandler.body(MailContentHandler.java:82)
> at
> org.apache.james.mime4j.parser.MimeStreamParser.parse(MimeStreamParser.java:133)
> at org.apache.tika.parser.mail.RFC822Parser.parse(RFC822Parser.java:76)
> at
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
> at
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
> at
> org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120)
> at org.apache.tika.Tika.parseToString(Tika.java:380)
> at org.freeeed.main.DocumentParser.parse(DocumentParser.java:33)
>
>
>
> That's testing on Enron data set
--
Jon Gorrono
PGP Key: 0x5434509D -
http{pgp.mit.edu:11371/pks/lookup?search=0x5434509D&op=index}
GSWoT Introducer - {GSWoT:US75 5434509D Jon P. Gorrono <jpgorrono -
www.gswot.org>}
http{middleware.ucdavis.edu}