You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by Emmanuel Espina <es...@gmail.com> on 2012/03/14 17:07:46 UTC

UpdateRequestProcessor to extract Solr XML from rich documents

I've created an update request handler to save a file with the xml
that represents the document in an external directory. The original
idea behind this was to add it to the processing chain of the
ExtractingRequestHandler to store an already parsed version of the
docs. This storage of pre-parsed documents will make the re indexing
of the entire index faster (avoiding the Tika phase, and just sending
the xml to the standard update processor).
As a side effect, extracting the xml can make debugging of rich docs easier.

I'm attaching a first and very simple POC of it. What are your
opinions on adding this as a jira issue?

Thanks
Emmanuel

Re: UpdateRequestProcessor to extract Solr XML from rich documents

Posted by Mark Miller <ma...@gmail.com>.
On Mar 14, 2012, at 12:07 PM, Emmanuel Espina wrote:

> <XmlWritingUpdateProcessorFactory.java>

+1 - looks like a useful update proc. I'd make a couple minor suggestions, like looking at the response of mkdirs and logging an error or warning if it doesn't exist and can't be made, and closing the file writer in a finally block.

I'd go straight to a JIRA though.

- Mark Miller
lucidimagination.com












---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org