You are viewing a plain text version of this content. The canonical link for it is here.
Posted to server-dev@james.apache.org by "Benoit Tellier (Jira)" <ji...@apache.org> on 2020/12/21 11:40:00 UTC

[jira] [Created] (MAILBOX-403) Email main body is also indexed as an attachment

Benoit Tellier created MAILBOX-403:
--------------------------------------

             Summary: Email main body is also indexed as an attachment
                 Key: MAILBOX-403
                 URL: https://issues.apache.org/jira/browse/MAILBOX-403
             Project: James Mailbox
          Issue Type: Bug
            Reporter: Benoit Tellier


## What

I discovered that the main body part, holding the text of an email, and already indexed as part of textBody/htmlBody properties, is also indexed as an attachment.

This behaviour is functionally wrong, as it returns attachment hits for terms contained in the body of the message. 

It also cause a larger index size, meaning more disk costs, and higher latencies.

## Definition of done

Unit tests emonstrating ElasticSearch main bodies are no longer indexed as attachments.

## How

Upon turning children subparts into attachment (flattening) only keep mime parts that explicitly have a content-disposition (either inline or attachment).

This by the way avoids indexing multiparts as attachments (they were not filtered out...)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: server-dev-unsubscribe@james.apache.org
For additional commands, e-mail: server-dev-help@james.apache.org