You are viewing a plain text version of this content. The canonical link for it is here.
Posted to server-dev@james.apache.org by "Benoit Tellier (Jira)" <se...@james.apache.org> on 2022/05/05 09:18:00 UTC

[jira] [Commented] (JAMES-2080) ES mapping: avoid using nested and use object if this affect performance

    [ https://issues.apache.org/jira/browse/JAMES-2080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17532154#comment-17532154 ] 

Benoit Tellier commented on JAMES-2080:
---------------------------------------

For the header field, we query things in a key, value fashion and we relied on nested documents to do that without dynamic mappings.

As stated above Nested documents likely have major implications on dataset size and indexation timings.

Today I found https://www.elastic.co/guide/en/elasticsearch/reference/current/flattened.html that might be used as an alternative. 

We should evaluate its size impact and indexation time impact.

> ES mapping: avoid using nested and use object if this affect performance
> ------------------------------------------------------------------------
>
>                 Key: JAMES-2080
>                 URL: https://issues.apache.org/jira/browse/JAMES-2080
>             Project: James Server
>          Issue Type: Improvement
>          Components: elasticsearch
>            Reporter: Luc DUZAN
>            Priority: Major
>
> This ticket should be done after https://issues.apache.org/jira/browse/JAMES-2078.
> On our mapping we use nested for header, from, cc, bcc. We know theoretically that nested do reduce performance (creation of invisible document to handle nested value) so when possible object should be used instead.
> In a first time, you should monitor how important the performance is. If the performance lost introduced by nested is significant then, you should estimate and found a work around about the lost of information see:
> * https://www.elastic.co/guide/en/elasticsearch/reference/2.2/nested.html
> * https://www.elastic.co/guide/en/elasticsearch/reference/2.2/object.html
> For the moment, we think this lost of information is not a issue for FROM, CC, BCC.
> But for sure, it will be a issue for headers. A way to work arround it would be to transform the following:
> { headers: [{key: "key1", value: ["value1", "value2"]}, {key: "key2", value: "something"}}
> To that:
> { headers: ["key1:value1", "key1:value2", "key2:something"] }
> But reflexion need to be done too see if this will work for the kind of query we need to do in the headers.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

---------------------------------------------------------------------
To unsubscribe, e-mail: server-dev-unsubscribe@james.apache.org
For additional commands, e-mail: server-dev-help@james.apache.org