You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Andrey Padiy <an...@gmail.com> on 2013/09/27 18:28:42 UTC

Solr MailEntityProcessor not indexing "Content-Type: multipart/mixed;" emails

Hi,

Trying to use DIH and MailEntityProcessor but are unable to index emails
that have "Content-Type: multipart/mixed;" or Content-Type:
multipart/related; header.

Solr logs show correct number of emails in the inbox when IMAP connection
is established but only emails that are of "Content-Type: text/plain;" or
"Content-Type: text/html;" are indexed. No exceptions thrown.

I am using out of the box example config that ships with solr-4-4.0 with
the following data-config.xml

<dataConfig>
  <document>
      <!--
        Note - In order to index attachments, set processAttachement="true"
and drop
        Tika and its dependencies to example-DIH/solr/mail/lib directory
       -->
      <entity processor="MailEntityProcessor"
            user="our_email@address"
            password="password"
            host="imap.gmail.com"
            protocol="imaps"
            folders="Inbox"
            name="sample_entity"
            fetchSize="10000000"
            processAttachement="true"


    />
  </document>
</dataConfig>

Is this a know bug?

Thanks.