You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Andrey Padiy <an...@gmail.com> on 2013/09/27 18:28:42 UTC
Solr MailEntityProcessor not indexing "Content-Type:
multipart/mixed;" emails
Hi,
Trying to use DIH and MailEntityProcessor but are unable to index emails
that have "Content-Type: multipart/mixed;" or Content-Type:
multipart/related; header.
Solr logs show correct number of emails in the inbox when IMAP connection
is established but only emails that are of "Content-Type: text/plain;" or
"Content-Type: text/html;" are indexed. No exceptions thrown.
I am using out of the box example config that ships with solr-4-4.0 with
the following data-config.xml
<dataConfig>
<document>
<!--
Note - In order to index attachments, set processAttachement="true"
and drop
Tika and its dependencies to example-DIH/solr/mail/lib directory
-->
<entity processor="MailEntityProcessor"
user="our_email@address"
password="password"
host="imap.gmail.com"
protocol="imaps"
folders="Inbox"
name="sample_entity"
fetchSize="10000000"
processAttachement="true"
/>
</document>
</dataConfig>
Is this a know bug?
Thanks.