You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "David Webb (Closed) (JIRA)" <ji...@apache.org> on 2011/11/13 21:33:51 UTC
[jira] [Closed] (SOLR-2896) TikiEntityProcessor onError not working
in some cases
[ https://issues.apache.org/jira/browse/SOLR-2896?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
David Webb closed SOLR-2896.
----------------------------
Resolution: Invalid
I realized I had the onError on the wrong <Entity> element. Working properly.
> TikiEntityProcessor onError not working in some cases
> -----------------------------------------------------
>
> Key: SOLR-2896
> URL: https://issues.apache.org/jira/browse/SOLR-2896
> Project: Solr
> Issue Type: Bug
> Components: contrib - DataImportHandler
> Affects Versions: 3.4
> Environment: Windows 7, JDK 1.6.0_18, Solr 3.4.0
> Reporter: David Webb
> Attachments: resume only true.doc
>
>
> When using the TikaEntityProcessor, I can a particular document (attached for testing) that causes a TikaException. If the onError parameter of the TikaEntityProcessor is set to "skip" or "continue", the DIH still aborts and rolls back the entire indexing process.
> {code:title=data-config.xml snippet}
> <entity name="attach" onError="skip"
> query = "select filename, filedata from table where id = ${parentEntity.ID}"
> <field column="filename" name="filename"/>
> <entity dataSource="f2" processor="TikaEntityProcessor" url="filedata" dataField="attach.FILEDATA" format="text">
> <field column="text" name="filedata" />
> </entity>
> </entity>
> {code}
> {code}
> Nov 12, 2011 10:22:16 AM org.apache.solr.common.SolrException log
> SEVERE: Full Import failed:org.apache.solr.handler.dataimport.DataImportHandlerException: Unable to read content Processing Document # 562
> at org.apache.solr.handler.dataimport.DataImportHandlerException.wrapAndThrow(DataImportHandlerException.java:72)
> at org.apache.solr.handler.dataimport.TikaEntityProcessor.nextRow(TikaEntityProcessor.java:130)
> at org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(EntityProcessorWrapper.java:238)
> at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:596)
> at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:622)
> at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:622)
> at org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:268)
> at org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:187)
> at org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:359)
> at org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:427)
> at org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:408)
> Caused by: org.apache.tika.exception.TikaException: Unexpected RuntimeException from org.apache.tika.parser.ParserDecorator$1@8a799a
> at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:199)
> at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:137)
> at org.apache.solr.handler.dataimport.TikaEntityProcessor.nextRow(TikaEntityProcessor.java:128)
> ... 9 more
> Caused by: java.lang.ArrayIndexOutOfBoundsException: 29
> at org.apache.poi.hwpf.model.StyleSheet.getCharacterStyle(StyleSheet.java:315)
> at org.apache.poi.hwpf.model.CHPX.getCharacterProperties(CHPX.java:60)
> at org.apache.poi.hwpf.usermodel.CharacterRun.<init>(CharacterRun.java:98)
> at org.apache.poi.hwpf.usermodel.Range.getCharacterRun(Range.java:797)
> at org.apache.poi.hwpf.model.PicturesTable.getAllPictures(PicturesTable.java:191)
> at org.apache.tika.parser.microsoft.WordExtractor$PicturesSource.<init>(WordExtractor.java:429)
> at org.apache.tika.parser.microsoft.WordExtractor$PicturesSource.<init>(WordExtractor.java:419)
> at org.apache.tika.parser.microsoft.WordExtractor.parse(WordExtractor.java:75)
> at org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:187)
> at org.apache.tika.parser.ParserDecorator.parse(ParserDecorator.java:91)
> at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:197)
> ... 11 more
> Nov 12, 2011 10:22:16 AM org.apache.solr.update.DirectUpdateHandler2 rollback
> INFO: start rollback
> Nov 12, 2011 10:22:16 AM org.apache.solr.update.DirectUpdateHandler2 rollback
> INFO: end_rollback
> {code}
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org