You are viewing a plain text version of this content. The canonical link for it is here.
Posted to server-dev@james.apache.org by "Benoit Tellier (Jira)" <ji...@apache.org> on 2020/12/22 05:43:00 UTC

[jira] [Comment Edited] (MAILBOX-403) Email main body is also indexed as an attachment

    [ https://issues.apache.org/jira/browse/MAILBOX-403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17253204#comment-17253204 ] 

Benoit Tellier edited comment on MAILBOX-403 at 12/22/20, 5:42 AM:
-------------------------------------------------------------------

I realized the sorting emails by Display From/TO is never used.

Removing this feature will not only simplify the code but also allow to get rid of the sorting (unanalized) field.

I will fire a separate MR for this. MR: https://github.com/linagora/james-project/pull/4154


was (Author: btellier):
I realized the sorting emails by Display From/TO is never used.

Removing this feature will not only simplify the code but also allow to get rid of the sorting (unanalized) field.

I will fire a separate MR for this.

> Email main body is also indexed as an attachment
> ------------------------------------------------
>
>                 Key: MAILBOX-403
>                 URL: https://issues.apache.org/jira/browse/MAILBOX-403
>             Project: James Mailbox
>          Issue Type: Bug
>            Reporter: Benoit Tellier
>            Priority: Major
>
> h2. What
> I discovered that the main body part, holding the text of an email, and already indexed as part of textBody/htmlBody properties, is also indexed as an attachment.
> This behaviour is functionally wrong, as it returns attachment hits for terms contained in the body of the message. 
> It also cause a larger index size, meaning more disk costs, and higher latencies.
> h2. Definition of done
> Unit tests emonstrating ElasticSearch main bodies are no longer indexed as attachments.
> h2. How
> Upon turning children subparts into attachment (flattening) only keep mime parts that explicitly have a content-disposition (either inline or attachment).
> This by the way avoids indexing multiparts as attachments (they were not filtered out...)
> Proposed fix: https://github.com/linagora/james-project/pull/4152



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: server-dev-unsubscribe@james.apache.org
For additional commands, e-mail: server-dev-help@james.apache.org