You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@tika.apache.org by Sreenivasa Kallu <sr...@gmail.com> on 2016/02/17 00:34:33 UTC
tika is unable to extract outlook messages
Hi ,
I am currently indexing individual outlook messages and searching is
working fine.
I have created solr core using following command.
./solr create -c sreenimsg1 -d data_driven_schema_configs
I am using following command to index individual messages.
curl "
http://localhost:8983/solr/sreenimsg/update/extract?literal.id=msg9&uprefix=attr_&fmap.content=attr_content&commit=true"
-F "myfile=@/home/ec2-user/msg9.msg"
This setup is working fine.
But new requirement is extract messages using outlook pst file.
I tried following command to extract messages from outlook pst file.
curl "
http://localhost:8983/solr/sreenimsg1/update/extract?literal.id=msg7&uprefix=attr_&fmap.content=attr_content&commit=true"
-F "myfile=@/home/ec2-user/sateamc_0006.pst"
This command extracting only high level tags and extracting all messages
into one message. I am not getting all tags when extracted individual
messgaes. is above command is correct? is it problem not using recursion?
how to add recursion to above command ? is it tika library problem?
Please help to solve above problem.
Advanced Thanks.
--sreenivasa kallu
RE: tika is unable to extract outlook messages
Posted by "Allison, Timothy B." <ta...@mitre.org>.
See my response to your question on the Solr users’ list here: https://mail-archives.apache.org/mod_mbox/lucene-solr-user/201602.mbox/%3CCY1PR09MB0795E8DBA7B2B6603A45820EC7A80%40CY1PR09MB0795.namprd09.prod.outlook.com%3E
I don’t think this is a Tika problem. This is the standard way that Solr’s DIH handles embedded documents…it concatenates all embedded documents onto one String.
If you want to treat each individual attachment as a separate file, you’ll have to do preprocessing on your pst or run Tika on your own (see the RecursiveParserWrapper, perhaps) and send documents to Solr via SolrJ (https://lucidworks.com/blog/2012/02/14/indexing-with-solrj/).
From: Sreenivasa Kallu [mailto:sreenivasakallu@gmail.com]
Sent: Tuesday, February 16, 2016 6:35 PM
To: user@tika.apache.org
Subject: tika is unable to extract outlook messages
Hi ,
I am currently indexing individual outlook messages and searching is working fine.
I have created solr core using following command.
./solr create -c sreenimsg1 -d data_driven_schema_configs
I am using following command to index individual messages.
curl "http://localhost:8983/solr/sreenimsg/update/extract?literal.id=msg9&uprefix=attr_&fmap.content=attr_content&commit=true" -F "myfile=@/home/ec2-user/msg9.msg<mailto:myfile=@/home/ec2-user/msg9.msg>"
This setup is working fine.
But new requirement is extract messages using outlook pst file.
I tried following command to extract messages from outlook pst file.
curl "http://localhost:8983/solr/sreenimsg1/update/extract?literal.id=msg7&uprefix=attr_&fmap.content=attr_content&commit=true" -F "myfile=@/home/ec2-user/sateamc_0006.pst<mailto:myfile=@/home/ec2-user/sateamc_0006.pst>"
This command extracting only high level tags and extracting all messages into one message. I am not getting all tags when extracted individual messgaes. is above command is correct? is it problem not using recursion? how to add recursion to above command ? is it tika library problem?
Please help to solve above problem.
Advanced Thanks.
--sreenivasa kallu