You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Reeza Edah Tally <re...@nova-hub.com> on 2011/04/14 12:15:11 UTC

Do EntityProcessor honor onError=skip when nextRow() fails?

Hi,

 

The document that I am trying to index with DIH contains an entity with
fields queried from a DB and an entity with the content of a file extracted
with TikaEntityProcessor. I was testing the onError="skip" option with
TikaEntityProcessor and found out it does not work. It basically behaves
like an onError="continue". I.e. the document still ends up in my index with
the DB fields but no file content. This is a problem because my index is
inconsistent with respect to my business data.

 

It seems that the issue lies in EntityProcessorWrapper which swallows
exceptions from nextRow() unless onError="abort". So is it safe to say that
this option just does not work? Can somebody please suggest an alternative
that would enable me to import all or nothing?

 

1 more observation: TikaEntityProcessor line 132 does not close the
InputStream in a finally clause; if parsing fails it remains open.

 

Thanks,

Reeza