You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Tim Allison (JIRA)" <ji...@apache.org> on 2018/03/01 20:56:00 UTC

[jira] [Comment Edited] (SOLR-12048) Cannot index formatted mail

    [ https://issues.apache.org/jira/browse/SOLR-12048?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16382624#comment-16382624 ] 

Tim Allison edited comment on SOLR-12048 at 3/1/18 8:55 PM:
------------------------------------------------------------

Or probably user@tika.apache.org :)

+1 to closing this issue and moving the discussion to the Solr user list.

In Tika <=1.17, these alternate bodies were treated as attachments, and we've fixed this for 1.18.

Make sure to change {{processAttachement}} to true if you haven't!

from {{mail-data-config.xml}}
{noformat}
 <!--
        Note - In order to index attachments, set processAttachement="true" and drop
        Tika and its dependencies to example-DIH/solr/mail/lib directory
       -->
{noformat}


was (Author: tallison@mitre.org):
Or probably user@tika.apache.org :)

In Tika <=1.17, these alternate bodies were treated as attachments, and we've fixed this for 1.18.

Make sure to change {{processAttachement}} to true if you haven't!

from {{mail-data-config.xml}}
{noformat}
 <!--
        Note - In order to index attachments, set processAttachement="true" and drop
        Tika and its dependencies to example-DIH/solr/mail/lib directory
       -->
{noformat}

> Cannot index formatted mail
> ---------------------------
>
>                 Key: SOLR-12048
>                 URL: https://issues.apache.org/jira/browse/SOLR-12048
>             Project: Solr
>          Issue Type: Improvement
>      Security Level: Public(Default Security Level. Issues are Public) 
>    Affects Versions: 7.1
>            Reporter: Dimitris
>            Priority: Major
>         Attachments: index_no_content.txt, index_success.txt
>
>
> Using /example/example-DIH/solr/mail/ configuration, a gmail mailbox has been indexed. Nevertheless, only plain text mails are indexed. Formatted content is not indexed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org