You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by keeblerh <ke...@yahoo.com> on 2014/09/09 19:40:03 UTC

Re: ExtractingRequestHandler indexing zip files

I am also having the issue where my zip contents (or kmz contents) are not
being processed - only the file names are processed.  It seems to recognize
the kmz extension and open the file just doesn't recurse the processing on
the contents.
The patch you mention has been around for a while.  I am running solr 4.8.1
and looks like the tika jar is 1.5. So I would think the patch would be
included already.  Do I need additional configuration?  My config is as
follows: 
<dataConfig><dataSource type="BinFileDataSource" /><document><entity
name="kmlfiles" dataSource=null" rootEntity="false" baseDir="mydirectory"
fileName=".*\.kmz$" onError="skip" processor="FileListEntityProcessor"
recursive="false" >
<field defs........................
/>
<entity name="kmlImport" processor="TikaEntityProcessor"
datasource="kmlfiles" htmlMapper="identity"
transformer="TemplateTransformer" url="${kmlfiles.fileAbsolutePath}">
<more field defs....
/></entity>
</entity>
</document></dataConfig>

and I am using the dataImport option from the admin page  Thanks for any
assistance - I'm on a closed network and getting patches to it are not
trival.




--
View this message in context: http://lucene.472066.n3.nabble.com/ExtractingRequestHandler-indexing-zip-files-tp4138172p4157650.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: ExtractingRequestHandler indexing zip files

Posted by keeblerh <ke...@yahoo.com>.
Working now - fyi - the "update/extract" from a post works extracting from a
kmz(zip) but  I am still having trouble from the dataimport. I'll move to
another thread for that.  THANKS all. 



--
View this message in context: http://lucene.472066.n3.nabble.com/ExtractingRequestHandler-indexing-zip-files-tp4138172p4158207.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: ExtractingRequestHandler indexing zip files

Posted by keeblerh <ke...@yahoo.com>.
Thanks for the info Sergio.  I updated my 4.8.1 version with that patch and
SOLR 4216 (which was really the same thing).  It took a day to get it to
compile on my network and it still doesn't work.  Did my config file look
correct?  I'm wondering if I need another param somewhere.

"Patch has to be applied to the source code and compile again Solr.war.
If you do that then it works extracting the content of documents "



--
View this message in context: http://lucene.472066.n3.nabble.com/ExtractingRequestHandler-indexing-zip-files-tp4138172p4158024.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: ExtractingRequestHandler indexing zip files

Posted by marotosg <ma...@gmail.com>.
hi keeblerh,

Patch has to be applied to the source code and compile again Solr.war.
If you do that then it works extracting the content of documents

Regards,
Sergio



--
View this message in context: http://lucene.472066.n3.nabble.com/ExtractingRequestHandler-indexing-zip-files-tp4138172p4157673.html
Sent from the Solr - User mailing list archive at Nabble.com.