You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Derek Croxton <cr...@yahoo.com> on 2011/04/28 03:34:18 UTC
Indexing odt files
Requesting help for someone way outside of his comfort zone. :)
I'm trying to use solr to index several hundred OpenDocument files. I
downloaded and installed the example site and got it to work on the same files.
I modified post.sh to change the mime type to vnd.oasis.opendocument.text (and I
also tried x-vnd.oasis.opendocument.text). When I try to post it, I get the
error, "Error 400 Invalid UTF-8 start byte 0xba (at char #12, byte #-1)".
The best I can tell is that it is trying to parse it as an xml document, not an
odt document, because when I do a hexdump of the odt file, I do see a character
0xba at approximately the right position, but it isn't there in the extracted
files.
I may be overlooking some configuration setting, or who knows what. I
understand the solr set up very poorly. If anyone can help me, I would be
grateful.
Sincerely,
Derek
Re: Indexing odt files
Posted by Grijesh <pi...@gmail.com>.
Hi Derek,
Simple Post Tool is only for posting xml docs. If you want to index
OpenDocument files then you have to use ExtractingRequestHandler (AKA solr
cell).
-----
Thanx:
Grijesh
www.gettinhahead.co.in
--
View this message in context: http://lucene.472066.n3.nabble.com/Indexing-odt-files-tp2872846p2873350.html
Sent from the Solr - User mailing list archive at Nabble.com.