You are viewing a plain text version of this content. The canonical link for it is here.
Posted to server-dev@james.apache.org by "Benoit Tellier (Jira)" <ji...@apache.org> on 2020/12/21 11:40:00 UTC
[jira] [Created] (MAILBOX-403) Email main body is also indexed as
an attachment
Benoit Tellier created MAILBOX-403:
--------------------------------------
Summary: Email main body is also indexed as an attachment
Key: MAILBOX-403
URL: https://issues.apache.org/jira/browse/MAILBOX-403
Project: James Mailbox
Issue Type: Bug
Reporter: Benoit Tellier
## What
I discovered that the main body part, holding the text of an email, and already indexed as part of textBody/htmlBody properties, is also indexed as an attachment.
This behaviour is functionally wrong, as it returns attachment hits for terms contained in the body of the message.
It also cause a larger index size, meaning more disk costs, and higher latencies.
## Definition of done
Unit tests emonstrating ElasticSearch main bodies are no longer indexed as attachments.
## How
Upon turning children subparts into attachment (flattening) only keep mime parts that explicitly have a content-disposition (either inline or attachment).
This by the way avoids indexing multiparts as attachments (they were not filtered out...)
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: server-dev-unsubscribe@james.apache.org
For additional commands, e-mail: server-dev-help@james.apache.org