You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Derek Croxton <cr...@yahoo.com> on 2011/04/28 03:34:18 UTC

Indexing odt files

Requesting help for someone way outside of his comfort zone. :)

I'm trying to use solr to index several hundred OpenDocument files.  I 
downloaded and installed the example site and got it to work on the same files.  
I modified post.sh to change the mime type to vnd.oasis.opendocument.text (and I 
also tried x-vnd.oasis.opendocument.text).  When I try to post it, I get the 
error, "Error 400 Invalid UTF-8 start byte 0xba (at char #12, byte #-1)".

The best I can tell is that it is trying to parse it as an xml document, not an 
odt document, because when I do a hexdump of the odt file, I do see a character 
0xba at approximately the right position, but it isn't there in the extracted 
files.

I may be overlooking some configuration setting, or who knows what.  I 
understand the solr set up very poorly.  If anyone can help me, I would be 
grateful.

 Sincerely, 
Derek

Re: Indexing odt files

Posted by Grijesh <pi...@gmail.com>.
Hi Derek,

Simple Post Tool is only for posting xml docs. If you want to index
OpenDocument files then you have to use ExtractingRequestHandler (AKA solr
cell). 

-----
Thanx: 
Grijesh 
www.gettinhahead.co.in 
--
View this message in context: http://lucene.472066.n3.nabble.com/Indexing-odt-files-tp2872846p2873350.html
Sent from the Solr - User mailing list archive at Nabble.com.