You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@james.apache.org by GitBox <gi...@apache.org> on 2022/08/08 13:34:25 UTC

[GitHub] [james-project] ottoka opened a new pull request, #1111: JAMES-3799 Optimize memory requirements of SimpleMessageSearchIndex

ottoka opened a new pull request, #1111:
URL: https://github.com/apache/james-project/pull/1111

   DRAFT: This adds a step to analyze the provided SearchQuery criteria and sort options to figure out the highest required mail fetch type.
   
   Someone please check my classification in getFetchTypeForCriterion() and getFetchTypeForSort() - I have not worked with these before and am not sure I got them all right.
   
   Unfortunately, I cannot really test this, as the InMemoryMessageMapper used by SimpleMessageSearchIndexTest seems to ignore the fetch type completely. Any suggestions? :(


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@james.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@james.apache.org
For additional commands, e-mail: notifications-help@james.apache.org


[GitHub] [james-project] ottoka commented on a diff in pull request #1111: JAMES-3799 Optimize memory requirements of SimpleMessageSearchIndex

Posted by GitBox <gi...@apache.org>.
ottoka commented on code in PR #1111:
URL: https://github.com/apache/james-project/pull/1111#discussion_r941151820


##########
mailbox/store/src/main/java/org/apache/james/mailbox/store/search/SimpleMessageSearchIndex.java:
##########
@@ -113,6 +114,67 @@ private static UidCriterion findConjugatedUidCriterion(List<Criterion> crits) {
         return null;
     }
     
+    /**
+     * Walks down the query tree's conjunctions to find the highest necessary mail fetch type.
+     * @param crits - list of Criterion to search from
+     * @return required fetch type - metadata, headers, or full
+     */
+    private static FetchType getFetchTypeForCriteria(List<Criterion> crits) {
+        FetchType maximum = FetchType.METADATA;
+        for (Criterion crit : crits) {
+            if (crit instanceof ConjunctionCriterion) {
+                return getFetchTypeForCriteria(((ConjunctionCriterion) crit)
+                    .getCriteria());
+            } else {
+                maximum = maxFetchType(maximum, getFetchTypeForCriterion(crit));
+            }
+        }
+        return maximum;
+    }
+
+    private static FetchType getFetchTypeForCriterion(Criterion crit) {
+        if (crit instanceof SearchQuery.AllCriterion || crit instanceof SearchQuery.TextCriterion) {
+            return FetchType.FULL;
+        } else
+        if (crit instanceof SearchQuery.HeaderCriterion || crit instanceof SearchQuery.MimeMessageIDCriterion) {
+            return FetchType.HEADERS;
+        } else {
+            return FetchType.METADATA;
+        }
+    }
+
+    /**
+     * Searches a list of query sort options for the highest necessary mail fetch type.
+     * @param sorts - list of Sort to search
+     * @return required fetch type - metadata or headers
+     */
+    private static FetchType getFetchTypeForSorts(List<SearchQuery.Sort> sorts) {
+        return sorts.stream()
+            .map(SimpleMessageSearchIndex::getFetchTypeForSort)
+            .reduce(FetchType.METADATA, SimpleMessageSearchIndex::maxFetchType);
+    }
+
+    private static FetchType getFetchTypeForSort(SearchQuery.Sort sort) {
+        switch (sort.getSortClause()) {
+            case Arrival:
+            case Size:
+            case Uid:
+            case Id:
+                return FetchType.METADATA;
+            case MailboxCc:
+            case MailboxFrom:
+            case MailboxTo:
+            case BaseSubject:
+            case SentDate:
+            default:
+                return FetchType.HEADERS;
+        }
+    }
+
+    private static FetchType maxFetchType(FetchType a, FetchType b) {
+        return ArrayUtils.indexOf(FetchType.values(), a) >= ArrayUtils.indexOf(FetchType.values(), b) ? a : b;
+    }

Review Comment:
   I noticed Java Enums are actually Comparable based on their ordinal, so I will use that instead. I believe ordinal based on declaration order should be standard across Java versions. But it won't hurt to add a test four our specific use case as well.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@james.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@james.apache.org
For additional commands, e-mail: notifications-help@james.apache.org


[GitHub] [james-project] ottoka commented on pull request #1111: JAMES-3799 Optimize memory requirements of SimpleMessageSearchIndex

Posted by GitBox <gi...@apache.org>.
ottoka commented on PR #1111:
URL: https://github.com/apache/james-project/pull/1111#issuecomment-1209190153

   Latest push includes suggestions so far.
   
   Re testability, I also added a second commit to make InMemoryMessageMapper honor the fetch type option - which immediately broke SimpleMessageSearchIndexTest.sortOnSentDateShouldWork(). Seems the UID optimization branch also needs a maxFetchType based on requested sort order. So apparently this was a worthwhile addition in itself. Lets see if Jenkins finds more affected tests :D


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@james.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@james.apache.org
For additional commands, e-mail: notifications-help@james.apache.org


[GitHub] [james-project] ottoka commented on pull request #1111: JAMES-3799 Optimize memory requirements of SimpleMessageSearchIndex

Posted by GitBox <gi...@apache.org>.
ottoka commented on PR #1111:
URL: https://github.com/apache/james-project/pull/1111#issuecomment-1211714096

   Unfortunately I do not understand the implications of removing BODY fetch type enough to tackle  it :(
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@james.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@james.apache.org
For additional commands, e-mail: notifications-help@james.apache.org


[GitHub] [james-project] ottoka commented on pull request #1111: JAMES-3799 Optimize memory requirements of SimpleMessageSearchIndex

Posted by GitBox <gi...@apache.org>.
ottoka commented on PR #1111:
URL: https://github.com/apache/james-project/pull/1111#issuecomment-1209422203

   OK, turns out honoring the fetch type breaks MemoryMessageMapperTest (and MemoryMessageWithAttachmentMapperTest), in different ways:
   1. The messagesRetrieved... tests want  bodyStartOctet to stay the same, regardless whether there actually is a body (HEADERS), or whether there are any preceding headers in the content at all (BODY). Effectively, have the messages **lie about** what they actually have in their content byte array. I can change the code to make it so, but I worry what happens if someone actually wants to retrieve the content?
   2. Some messagesRetrieved... tests basically want BODY and FULL to be the same thing in regard to findInMailbox(). A quick comparison shows that CassandraMessageMapper does exactly that, so fine. And incidentally, JPAMessageMapper also ignores the fetch type, meaning it will use a lot of memory on bulk search/retrieve operations. Oh well.
   3. The deleteMessages... tests are doing something strange during AbstractMessageMapper.updateFlag(). They first retrieve messages with fetch type METADATA and the save them via InMemoryMessageMapper.save(), which **copies the truncated message** and thus winds up with a broken message in the store?? So, this mechanism actually **relies** on MessageMappers to ignore the fetch type???
   
   At this point things are getting too weird for me, and I am not sure how all this is actually supposed to behave. Help please?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@james.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@james.apache.org
For additional commands, e-mail: notifications-help@james.apache.org


[GitHub] [james-project] chibenwa commented on pull request #1111: JAMES-3799 Optimize memory requirements of SimpleMessageSearchIndex

Posted by GitBox <gi...@apache.org>.
chibenwa commented on PR #1111:
URL: https://github.com/apache/james-project/pull/1111#issuecomment-1212000157

   I'll try to get my hands on it, when I got time!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@james.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@james.apache.org
For additional commands, e-mail: notifications-help@james.apache.org


[GitHub] [james-project] ottoka commented on a diff in pull request #1111: JAMES-3799 Optimize memory requirements of SimpleMessageSearchIndex

Posted by GitBox <gi...@apache.org>.
ottoka commented on code in PR #1111:
URL: https://github.com/apache/james-project/pull/1111#discussion_r945456785


##########
mailbox/store/src/main/java/org/apache/james/mailbox/store/search/SimpleMessageSearchIndex.java:
##########
@@ -132,14 +192,16 @@ private Set<MailboxMessage> searchResults(MailboxSession session, Mailbox mailbo
             // only fetching this uid range
             UidRange[] ranges = uidCrit.getOperator().getRange();
             for (UidRange r : ranges) {
-                Iterator<MailboxMessage> it = mapper.findInMailbox(mailbox, MessageRange.range(r.getLowValue(), r.getHighValue()), FetchType.METADATA, UNLIMITED);
+                FetchType fetchType = maxFetchType(FetchType.METADATA, getFetchTypeForSorts(query.getSorts()));

Review Comment:
   Right, totally makes sense. I will fix it.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@james.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@james.apache.org
For additional commands, e-mail: notifications-help@james.apache.org


[GitHub] [james-project] ottoka commented on pull request #1111: JAMES-3799 Optimize memory requirements of SimpleMessageSearchIndex

Posted by GitBox <gi...@apache.org>.
ottoka commented on PR #1111:
URL: https://github.com/apache/james-project/pull/1111#issuecomment-1210356023

   > > Effectively, have the messages lie about what they actually have in their content byte array.
   > 
   > No they are supposed to tell what the underlying data is regardless of what they did actually load.
   > 
   Thanks for the explanation. In this case, I am OK with the necessary changes and will adapt the code accordingly.
   
   > > Tests basically want BODY and FULL to be the same thing in regard to findInMailbox().
   > 
   > Good catch. Maybe we should just **drop** Body fetch type and use **FULL** instead, which would simplify the code, and avoid some confusions. We don't use it (FetchType.BODY) so it effectively looks like accidental complexity we might want get rid of...
   > 
   > That would simplify your issue?
   > 
   On a quick look I saw that BODY and FULL are different things in the Cassandra Mapper when it loads the message into memory. I do wonder what the use case is for having a message with full (potentially large) body but without the headers. BODY is used in a lot of tests, and I cannot really tell if the differenc eis significant there, or if they really do want FULL and are confised by the distinction. In the latter case, I agree it makes sense to drop BODY, but that will be a bigger/different ticket.
   
   > > The deleteMessages... tests are doing something strange during AbstractMessageMapper.updateFlag(). They first retrieve messages with fetch type METADATA and the save them via InMemoryMessageMapper.save(), which copies the truncated message and thus winds up with a broken message in the store?? So, this mechanism actually relies on MessageMappers to ignore the fetch type???
   > 
   > What you describe looks like a behavior specific to the in-memory implementation (ignoring fetch type, etc..). So ok memory mailbox, implemented for testing purposes, might benefit from a rework...
   > 
   > > turns out honoring the fetch type breaks MemoryMessageMapperTest (and MemoryMessageWithAttachmentMapperTest), in different ways
   > 
   > Do we care about memory honoring the fetch type stuff? Ok, it makes it closer to real implementations (cassandra / JPA) meaning catching some issues earlier in the development cycle, but integration tests (here MPT IMAP tests?) still get us covered. While making memory fetch type aware the way Cassandra implem is would be a contribution I would support, I do not know if I would consider it 'essential'
   > 
   Agreed it is not essential, but it DID catch a possible bug, so I believe it is worth doing. After your explanations I have a better understanding of the issues involved and feel more comfortable in doing the necessary code changes.
   
   > > save them via InMemoryMessageMapper.save(), which copies the truncated message and thus winds up with a broken message in the store?? So, this mechanism actually relies on MessageMappers to ignore the fetch type???
   > 
   > We could adapt "save" behaviour to actually update message metadata instead of replacing it, if the message is already present?
   Again, thanks for the clarification. I compared with JPAMessageMapper, which does exactly that, i.e. re-fetch an existing message to update its flags and modseq. So I will do this for InMemoryMessageMapper as well, which should fix the remaining test failures in MemoryMessageMapperTest.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@james.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@james.apache.org
For additional commands, e-mail: notifications-help@james.apache.org


[GitHub] [james-project] ottoka merged pull request #1111: JAMES-3799 Optimize memory requirements of SimpleMessageSearchIndex

Posted by GitBox <gi...@apache.org>.
ottoka merged PR #1111:
URL: https://github.com/apache/james-project/pull/1111


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@james.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@james.apache.org
For additional commands, e-mail: notifications-help@james.apache.org


[GitHub] [james-project] chibenwa commented on pull request #1111: JAMES-3799 Optimize memory requirements of SimpleMessageSearchIndex

Posted by GitBox <gi...@apache.org>.
chibenwa commented on PR #1111:
URL: https://github.com/apache/james-project/pull/1111#issuecomment-1210142364

   > Help please?
   
   I will try to look at it in the coming week.
   
   > Effectively, have the messages lie about what they actually have in their content byte array. 
   
   No they are supposed to tell what the underlying data is regardless of what they did actually load.
   
   > Tests basically want BODY and FULL to be the same thing in regard to findInMailbox().
   
   Good catch. Maybe we should just **drop** Body fetch type and use **FULL** instead, which would simplify the code, and avoid some confusions. We don't use it (FetchType.BODY) so it effectively looks like accidental complexity we might want get rid of...
   
   That would simplify your issue?
   
   > The deleteMessages... tests are doing something strange during AbstractMessageMapper.updateFlag(). They first retrieve messages with fetch type METADATA and the save them via InMemoryMessageMapper.save(), which copies the truncated message and thus winds up with a broken message in the store?? So, this mechanism actually relies on MessageMappers to ignore the fetch type???
   
   What you describe looks like a behavior specific to the in-memory implementation (ignoring fetch type, etc..). So ok memory mailbox, implemented for testing purposes, might benefit from a rework...
   
   >  turns out honoring the fetch type breaks MemoryMessageMapperTest (and MemoryMessageWithAttachmentMapperTest), in different ways
   
   Do we care about memory honoring the fetch type stuff? Ok, it makes it closer to real implementations (cassandra / JPA) meaning catching some issues earlier in the development cycle, but integration tests (here MPT IMAP tests?) still get us covered. While making memory fetch type aware the way Cassandra implem is would be a contribution I would support, I do not know if I would consider it 'essential'
   
   > save them via InMemoryMessageMapper.save(), which copies the truncated message and thus winds up with a broken message in the store?? So, this mechanism actually relies on MessageMappers to ignore the fetch type???
   
   We could adapt "save" behaviour to actually update message metadata instead of replacing it, if the message is already present?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@james.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@james.apache.org
For additional commands, e-mail: notifications-help@james.apache.org


[GitHub] [james-project] vttranlina commented on a diff in pull request #1111: JAMES-3799 Optimize memory requirements of SimpleMessageSearchIndex

Posted by GitBox <gi...@apache.org>.
vttranlina commented on code in PR #1111:
URL: https://github.com/apache/james-project/pull/1111#discussion_r945419891


##########
mailbox/store/src/main/java/org/apache/james/mailbox/store/search/SimpleMessageSearchIndex.java:
##########
@@ -132,14 +192,16 @@ private Set<MailboxMessage> searchResults(MailboxSession session, Mailbox mailbo
             // only fetching this uid range
             UidRange[] ranges = uidCrit.getOperator().getRange();
             for (UidRange r : ranges) {
-                Iterator<MailboxMessage> it = mapper.findInMailbox(mailbox, MessageRange.range(r.getLowValue(), r.getHighValue()), FetchType.METADATA, UNLIMITED);
+                FetchType fetchType = maxFetchType(FetchType.METADATA, getFetchTypeForSorts(query.getSorts()));

Review Comment:
   Is better if we move fetchType outside the `for loop`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@james.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@james.apache.org
For additional commands, e-mail: notifications-help@james.apache.org


[GitHub] [james-project] ottoka commented on pull request #1111: JAMES-3799 Optimize memory requirements of SimpleMessageSearchIndex

Posted by GitBox <gi...@apache.org>.
ottoka commented on PR #1111:
URL: https://github.com/apache/james-project/pull/1111#issuecomment-1211994873

   I put the InMemoryMessageMapper changes into https://github.com/ottoka/james-project/tree/JAMES-3799-MemoryMapper


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@james.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@james.apache.org
For additional commands, e-mail: notifications-help@james.apache.org


[GitHub] [james-project] ottoka commented on pull request #1111: JAMES-3799 Optimize memory requirements of SimpleMessageSearchIndex

Posted by GitBox <gi...@apache.org>.
ottoka commented on PR #1111:
URL: https://github.com/apache/james-project/pull/1111#issuecomment-1210779409

   This is interesting. In the latest version, InMemoryMessageMapper does lookup up an existing message by mailbox+UID and updates it if it exists. This breaks MemoryMessageIdMapperTest>MessageIdMapperTest.findMailboxesShouldReturnTwoMailboxesWhenMessageExistsInTwoMailboxes().
   Turns out that when the test saves its original four messages, it puts message4 into mailbox 2. During this the message4 gets UID 1 from the AbstractMessageMappers built-in InMemoryUidProvider, since the latter knows about mailboxes and it was empty before, so UID 1 is fine.
   But then the test creates and saves copy of message1, and gives it a UID via the MessageUidProvider from the tests own InMemoryMapperProvider. MessageUidProvider is a different implementation that does not now about mailboxes but provides UIDs based on a simple counter. So message1copy **also** gets UID 1 by accident. When trying to save it in mailbox 2, it already finds message4 with UID 1 there due to the UID conflict, and updates it instead, which breaks the test.
   In contrast, the original implementation just **overwrites** message4 with message1copy in this case, which seems like a bug to me!
   
   Now I wonder if this kind of UID generator conflict can happen in production as well? I had thought a UID would be **u**nique wherever it came from. Or is it again an artifact of the simplified InMemoryXXX implementations? In the first case we should dig into the issue some more. In the second case, I believe the test should be adjusted to prevent the conflict, i.e. just fast-forwarding the tests MessageUidProvider to a non-conflicting UID. Any suggestions?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@james.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@james.apache.org
For additional commands, e-mail: notifications-help@james.apache.org


[GitHub] [james-project] chibenwa commented on a diff in pull request #1111: JAMES-3799 Optimize memory requirements of SimpleMessageSearchIndex

Posted by GitBox <gi...@apache.org>.
chibenwa commented on code in PR #1111:
URL: https://github.com/apache/james-project/pull/1111#discussion_r940804516


##########
mailbox/store/src/main/java/org/apache/james/mailbox/store/search/SimpleMessageSearchIndex.java:
##########
@@ -113,6 +114,67 @@ private static UidCriterion findConjugatedUidCriterion(List<Criterion> crits) {
         return null;
     }
     
+    /**
+     * Walks down the query tree's conjunctions to find the highest necessary mail fetch type.
+     * @param crits - list of Criterion to search from
+     * @return required fetch type - metadata, headers, or full
+     */
+    private static FetchType getFetchTypeForCriteria(List<Criterion> crits) {
+        FetchType maximum = FetchType.METADATA;
+        for (Criterion crit : crits) {
+            if (crit instanceof ConjunctionCriterion) {
+                return getFetchTypeForCriteria(((ConjunctionCriterion) crit)
+                    .getCriteria());
+            } else {
+                maximum = maxFetchType(maximum, getFetchTypeForCriterion(crit));
+            }
+        }
+        return maximum;
+    }
+
+    private static FetchType getFetchTypeForCriterion(Criterion crit) {
+        if (crit instanceof SearchQuery.AllCriterion || crit instanceof SearchQuery.TextCriterion) {
+            return FetchType.FULL;
+        } else
+        if (crit instanceof SearchQuery.HeaderCriterion || crit instanceof SearchQuery.MimeMessageIDCriterion) {
+            return FetchType.HEADERS;
+        } else {
+            return FetchType.METADATA;
+        }
+    }

Review Comment:
   If there is a conjunction criteria, we still need to take into account the next criterias.
   
   I think here is a very good example where functional programing might be more efficient? (map - reduce)
   
   For instance consider `AND(CritA, CritB), CritC` then with your current code CritC is ignored. Which is a bug.
   
   
   ```suggestion
       private static FetchType getFetchTypeForCriteria(List<Criterion> crits) {
           return crits.stream()
               .map(crit -> getFetchTypeForCriterion(crit))
               .reduce((a, b) -> maxFetchType(a, b))
               .orElse(FetchType.METADATA);
       }
   
       private static FetchType getFetchTypeForCriterion(Criterion crit) {
           if (crit instanceof ConjunctionCriterion) {
               return getFetchTypeForCriteria(((ConjunctionCriterion) crit)
                       .getCriteria())
           }
           if (crit instanceof SearchQuery.AllCriterion || crit instanceof SearchQuery.TextCriterion) {
               return FetchType.FULL;
           }
           if (crit instanceof SearchQuery.HeaderCriterion || crit instanceof SearchQuery.MimeMessageIDCriterion) {
               return FetchType.HEADERS;
           } 
            return FetchType.METADATA;
       }
   
   ```



##########
mailbox/store/src/main/java/org/apache/james/mailbox/store/search/SimpleMessageSearchIndex.java:
##########
@@ -113,6 +114,67 @@ private static UidCriterion findConjugatedUidCriterion(List<Criterion> crits) {
         return null;
     }
     
+    /**
+     * Walks down the query tree's conjunctions to find the highest necessary mail fetch type.
+     * @param crits - list of Criterion to search from
+     * @return required fetch type - metadata, headers, or full
+     */
+    private static FetchType getFetchTypeForCriteria(List<Criterion> crits) {
+        FetchType maximum = FetchType.METADATA;
+        for (Criterion crit : crits) {
+            if (crit instanceof ConjunctionCriterion) {
+                return getFetchTypeForCriteria(((ConjunctionCriterion) crit)
+                    .getCriteria());
+            } else {
+                maximum = maxFetchType(maximum, getFetchTypeForCriterion(crit));
+            }
+        }
+        return maximum;
+    }
+
+    private static FetchType getFetchTypeForCriterion(Criterion crit) {
+        if (crit instanceof SearchQuery.AllCriterion || crit instanceof SearchQuery.TextCriterion) {
+            return FetchType.FULL;
+        } else
+        if (crit instanceof SearchQuery.HeaderCriterion || crit instanceof SearchQuery.MimeMessageIDCriterion) {
+            return FetchType.HEADERS;
+        } else {
+            return FetchType.METADATA;
+        }
+    }
+
+    /**
+     * Searches a list of query sort options for the highest necessary mail fetch type.
+     * @param sorts - list of Sort to search
+     * @return required fetch type - metadata or headers
+     */
+    private static FetchType getFetchTypeForSorts(List<SearchQuery.Sort> sorts) {
+        return sorts.stream()
+            .map(SimpleMessageSearchIndex::getFetchTypeForSort)
+            .reduce(FetchType.METADATA, SimpleMessageSearchIndex::maxFetchType);
+    }
+
+    private static FetchType getFetchTypeForSort(SearchQuery.Sort sort) {
+        switch (sort.getSortClause()) {
+            case Arrival:
+            case Size:
+            case Uid:
+            case Id:
+                return FetchType.METADATA;
+            case MailboxCc:
+            case MailboxFrom:
+            case MailboxTo:
+            case BaseSubject:
+            case SentDate:
+            default:

Review Comment:
   Shouldn't we throw explicitly on unspecified sorts to avoid later bugs to sneak in ? (fail early?)



##########
mailbox/store/src/main/java/org/apache/james/mailbox/store/search/SimpleMessageSearchIndex.java:
##########
@@ -113,6 +114,67 @@ private static UidCriterion findConjugatedUidCriterion(List<Criterion> crits) {
         return null;
     }
     
+    /**
+     * Walks down the query tree's conjunctions to find the highest necessary mail fetch type.
+     * @param crits - list of Criterion to search from
+     * @return required fetch type - metadata, headers, or full
+     */
+    private static FetchType getFetchTypeForCriteria(List<Criterion> crits) {
+        FetchType maximum = FetchType.METADATA;
+        for (Criterion crit : crits) {
+            if (crit instanceof ConjunctionCriterion) {
+                return getFetchTypeForCriteria(((ConjunctionCriterion) crit)
+                    .getCriteria());
+            } else {
+                maximum = maxFetchType(maximum, getFetchTypeForCriterion(crit));
+            }
+        }
+        return maximum;
+    }
+
+    private static FetchType getFetchTypeForCriterion(Criterion crit) {
+        if (crit instanceof SearchQuery.AllCriterion || crit instanceof SearchQuery.TextCriterion) {
+            return FetchType.FULL;
+        } else
+        if (crit instanceof SearchQuery.HeaderCriterion || crit instanceof SearchQuery.MimeMessageIDCriterion) {
+            return FetchType.HEADERS;
+        } else {
+            return FetchType.METADATA;
+        }
+    }
+
+    /**
+     * Searches a list of query sort options for the highest necessary mail fetch type.
+     * @param sorts - list of Sort to search
+     * @return required fetch type - metadata or headers
+     */
+    private static FetchType getFetchTypeForSorts(List<SearchQuery.Sort> sorts) {
+        return sorts.stream()
+            .map(SimpleMessageSearchIndex::getFetchTypeForSort)
+            .reduce(FetchType.METADATA, SimpleMessageSearchIndex::maxFetchType);
+    }
+
+    private static FetchType getFetchTypeForSort(SearchQuery.Sort sort) {
+        switch (sort.getSortClause()) {
+            case Arrival:
+            case Size:
+            case Uid:
+            case Id:
+                return FetchType.METADATA;
+            case MailboxCc:
+            case MailboxFrom:
+            case MailboxTo:
+            case BaseSubject:
+            case SentDate:
+            default:
+                return FetchType.HEADERS;
+        }
+    }
+
+    private static FetchType maxFetchType(FetchType a, FetchType b) {
+        return ArrayUtils.indexOf(FetchType.values(), a) >= ArrayUtils.indexOf(FetchType.values(), b) ? a : b;
+    }

Review Comment:
   While I am ok with this hacky implementation there might be concern over the overall stability of this across java versions. I would write test for it!
   
   Also consider using `MessageMapper.FetchType.BODY.ordinal()` instead of doing manual array index lookup.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@james.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@james.apache.org
For additional commands, e-mail: notifications-help@james.apache.org


[GitHub] [james-project] ottoka commented on a diff in pull request #1111: JAMES-3799 Optimize memory requirements of SimpleMessageSearchIndex

Posted by GitBox <gi...@apache.org>.
ottoka commented on code in PR #1111:
URL: https://github.com/apache/james-project/pull/1111#discussion_r941154204


##########
mailbox/store/src/main/java/org/apache/james/mailbox/store/search/SimpleMessageSearchIndex.java:
##########
@@ -113,6 +114,67 @@ private static UidCriterion findConjugatedUidCriterion(List<Criterion> crits) {
         return null;
     }
     
+    /**
+     * Walks down the query tree's conjunctions to find the highest necessary mail fetch type.
+     * @param crits - list of Criterion to search from
+     * @return required fetch type - metadata, headers, or full
+     */
+    private static FetchType getFetchTypeForCriteria(List<Criterion> crits) {
+        FetchType maximum = FetchType.METADATA;
+        for (Criterion crit : crits) {
+            if (crit instanceof ConjunctionCriterion) {
+                return getFetchTypeForCriteria(((ConjunctionCriterion) crit)
+                    .getCriteria());
+            } else {
+                maximum = maxFetchType(maximum, getFetchTypeForCriterion(crit));
+            }
+        }
+        return maximum;
+    }
+
+    private static FetchType getFetchTypeForCriterion(Criterion crit) {
+        if (crit instanceof SearchQuery.AllCriterion || crit instanceof SearchQuery.TextCriterion) {
+            return FetchType.FULL;
+        } else
+        if (crit instanceof SearchQuery.HeaderCriterion || crit instanceof SearchQuery.MimeMessageIDCriterion) {
+            return FetchType.HEADERS;
+        } else {
+            return FetchType.METADATA;
+        }
+    }

Review Comment:
   Oops. I originally copied it from findConjugatedUidCriterion, but then envolved it from abbreviated to complete traversal and forgot to adapt it accordingly. Nice catch!



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@james.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@james.apache.org
For additional commands, e-mail: notifications-help@james.apache.org


[GitHub] [james-project] ottoka commented on pull request #1111: JAMES-3799 Optimize memory requirements of SimpleMessageSearchIndex

Posted by GitBox <gi...@apache.org>.
ottoka commented on PR #1111:
URL: https://github.com/apache/james-project/pull/1111#issuecomment-1211899157

   Making InMemoryMessageMapper honor the fetch type seems to be a game of whack-a-mole. Issues crop up in parts of James I have never touched before, and I don't know how to handle them. It has reached a point where it distracts me too much from the original purpose of the PR, and the effort of making it work outweigh its usefulness to me.
   
   > Do we care about memory honoring the fetch type stuff? Ok, it makes it closer to real implementations (cassandra / JPA) meaning catching some issues earlier in the development cycle, but integration tests (here MPT IMAP tests?) still get us covered. While making memory fetch type aware the way Cassandra implem is would be a contribution I would support, I do not know if I would consider it 'essential'
   
   I will follow your reasoning here and will regrettably remove the InMemoryMessageMapper commit from this PR. If you are interested in this topic, I can keep it in a different branch in my personal repo, or create a separate PR here.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@james.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@james.apache.org
For additional commands, e-mail: notifications-help@james.apache.org


[GitHub] [james-project] chibenwa commented on pull request #1111: JAMES-3799 Optimize memory requirements of SimpleMessageSearchIndex

Posted by GitBox <gi...@apache.org>.
chibenwa commented on PR #1111:
URL: https://github.com/apache/james-project/pull/1111#issuecomment-1211989579

   Separate branch looks the way to go!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@james.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@james.apache.org
For additional commands, e-mail: notifications-help@james.apache.org


[GitHub] [james-project] quantranhong1999 commented on pull request #1111: JAMES-3799 Optimize memory requirements of SimpleMessageSearchIndex

Posted by GitBox <gi...@apache.org>.
quantranhong1999 commented on PR #1111:
URL: https://github.com/apache/james-project/pull/1111#issuecomment-1208903835

   Read it.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@james.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@james.apache.org
For additional commands, e-mail: notifications-help@james.apache.org


[GitHub] [james-project] ottoka commented on a diff in pull request #1111: JAMES-3799 Optimize memory requirements of SimpleMessageSearchIndex

Posted by GitBox <gi...@apache.org>.
ottoka commented on code in PR #1111:
URL: https://github.com/apache/james-project/pull/1111#discussion_r941152239


##########
mailbox/store/src/main/java/org/apache/james/mailbox/store/search/SimpleMessageSearchIndex.java:
##########
@@ -113,6 +114,67 @@ private static UidCriterion findConjugatedUidCriterion(List<Criterion> crits) {
         return null;
     }
     
+    /**
+     * Walks down the query tree's conjunctions to find the highest necessary mail fetch type.
+     * @param crits - list of Criterion to search from
+     * @return required fetch type - metadata, headers, or full
+     */
+    private static FetchType getFetchTypeForCriteria(List<Criterion> crits) {
+        FetchType maximum = FetchType.METADATA;
+        for (Criterion crit : crits) {
+            if (crit instanceof ConjunctionCriterion) {
+                return getFetchTypeForCriteria(((ConjunctionCriterion) crit)
+                    .getCriteria());
+            } else {
+                maximum = maxFetchType(maximum, getFetchTypeForCriterion(crit));
+            }
+        }
+        return maximum;
+    }
+
+    private static FetchType getFetchTypeForCriterion(Criterion crit) {
+        if (crit instanceof SearchQuery.AllCriterion || crit instanceof SearchQuery.TextCriterion) {
+            return FetchType.FULL;
+        } else
+        if (crit instanceof SearchQuery.HeaderCriterion || crit instanceof SearchQuery.MimeMessageIDCriterion) {
+            return FetchType.HEADERS;
+        } else {
+            return FetchType.METADATA;
+        }
+    }
+
+    /**
+     * Searches a list of query sort options for the highest necessary mail fetch type.
+     * @param sorts - list of Sort to search
+     * @return required fetch type - metadata or headers
+     */
+    private static FetchType getFetchTypeForSorts(List<SearchQuery.Sort> sorts) {
+        return sorts.stream()
+            .map(SimpleMessageSearchIndex::getFetchTypeForSort)
+            .reduce(FetchType.METADATA, SimpleMessageSearchIndex::maxFetchType);
+    }
+
+    private static FetchType getFetchTypeForSort(SearchQuery.Sort sort) {
+        switch (sort.getSortClause()) {
+            case Arrival:
+            case Size:
+            case Uid:
+            case Id:
+                return FetchType.METADATA;
+            case MailboxCc:
+            case MailboxFrom:
+            case MailboxTo:
+            case BaseSubject:
+            case SentDate:
+            default:

Review Comment:
   Makes sense, I will change the default case to throw an IllegalArgumentException.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@james.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@james.apache.org
For additional commands, e-mail: notifications-help@james.apache.org


[GitHub] [james-project] chibenwa commented on pull request #1111: JAMES-3799 Optimize memory requirements of SimpleMessageSearchIndex

Posted by GitBox <gi...@apache.org>.
chibenwa commented on PR #1111:
URL: https://github.com/apache/james-project/pull/1111#issuecomment-1210418593

   > I do wonder what the use case is for having a message with full (potentially large) body but without the headers.
   
   That's the point, this use case do not really exist...
   
   >  but that will be a bigger/different ticket.
   
   +1
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@james.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@james.apache.org
For additional commands, e-mail: notifications-help@james.apache.org


[GitHub] [james-project] chibenwa commented on pull request #1111: JAMES-3799 Optimize memory requirements of SimpleMessageSearchIndex

Posted by GitBox <gi...@apache.org>.
chibenwa commented on PR #1111:
URL: https://github.com/apache/james-project/pull/1111#issuecomment-1210881635

   UID  (as defined per IMAP) are mailbox scopped...
   
   So copy/move the mail should allocate a new UID, using the UID generation mechanism of the target mailbox.
   
   Yes maybe it would be good to re-implement InMemoryMessageMappers without depending on AbstractMessageMapper (thus we could handle all subscenari in smarter ways....


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@james.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@james.apache.org
For additional commands, e-mail: notifications-help@james.apache.org