You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@james.apache.org by GitBox <gi...@apache.org> on 2022/09/19 04:08:59 UTC

[GitHub] [james-project] Arsnael commented on a diff in pull request #1175: JAMES-3812 Provide a ToPlainText mailet

Arsnael commented on code in PR #1175:
URL: https://github.com/apache/james-project/pull/1175#discussion_r973848984


##########
server/mailet/mailets/src/main/java/org/apache/james/transport/mailets/ToPlainText.java:
##########
@@ -0,0 +1,144 @@
+/****************************************************************
+ * Licensed to the Apache Software Foundation (ASF) under one   *
+ * or more contributor license agreements.  See the NOTICE file *
+ * distributed with this work for additional information        *
+ * regarding copyright ownership.  The ASF licenses this file   *
+ * to you under the Apache License, Version 2.0 (the            *
+ * "License"); you may not use this file except in compliance   *
+ * with the License.  You may obtain a copy of the License at   *
+ *                                                              *
+ *   http://www.apache.org/licenses/LICENSE-2.0                 *
+ *                                                              *
+ * Unless required by applicable law or agreed to in writing,   *
+ * software distributed under the License is distributed on an  *
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY       *
+ * KIND, either express or implied.  See the License for the    *
+ * specific language governing permissions and limitations      *
+ * under the License.                                           *
+ ****************************************************************/
+
+package org.apache.james.transport.mailets;
+
+import java.io.IOException;
+
+import javax.inject.Inject;
+import javax.mail.BodyPart;
+import javax.mail.MessagingException;
+import javax.mail.Multipart;
+import javax.mail.internet.MimeMessage;
+
+import org.apache.commons.io.IOUtils;
+import org.apache.james.util.html.HtmlTextExtractor;
+import org.apache.mailet.Mail;
+import org.apache.mailet.base.GenericMailet;
+
+/**
+ * This mailet converts HTML parts of a message into Plain text.
+ *
+ * It starts looking for multipart/alternative containing a text/plain and a text/html part
+ * and only keep the text/plain part. Then in a second pass replaces remaining text/html by
+ * their textual content, infered by parsing the HTML content and handling common tags.
+ *
+ * Eg:
+ *
+ * <mailet matcher="All" class="ToPlainText"/>
+ *
+ * Only available for servers having JMAP, not available for JPA.
+ */
+public class ToPlainText extends GenericMailet {
+    private final HtmlTextExtractor htmlTextExtractor;
+
+    @Inject
+    public ToPlainText(HtmlTextExtractor htmlTextExtractor) {
+        this.htmlTextExtractor = htmlTextExtractor;
+    }
+
+    @Override
+    public void service(Mail mail) throws MessagingException {
+        try {
+            if (removeHtmlFromMultipartAlternative(mail.getMessage())
+                    || convertRemainingHtmlToPlainText(mail.getMessage())) {
+                mail.getMessage().saveChanges();
+            }
+        } catch (Exception e) {
+            throw new MessagingException("Exception while extracting HTML", e);
+        }
+    }
+
+    // true if the message content is mutated
+    private boolean removeHtmlFromMultipartAlternative(MimeMessage mimeMessage) throws IOException, MessagingException {
+        if (mimeMessage.getContent() instanceof Multipart) {
+            Multipart multipart = (Multipart) mimeMessage.getContent();
+            return removeHtmlFromMultipartAlternativeForContent(multipart);
+        }
+        return false;
+    }
+
+    // true if the message content is mutated
+    private boolean removeHtmlFromMultipartAlternative(BodyPart bodyPart) throws IOException, MessagingException {
+        if (bodyPart.getContent() instanceof Multipart) {
+            Multipart multipart = (Multipart) bodyPart.getContent();
+            return removeHtmlFromMultipartAlternativeForContent(multipart);
+        }
+        return false;
+    }
+
+    // true if the message content is mutated
+    private boolean removeHtmlFromMultipartAlternativeForContent(Multipart multipart) throws MessagingException, IOException {
+        boolean mutated = false;
+        if (multipart.getContentType().startsWith("multipart/alternative")) {
+            int removedParts = 0;
+            for (int i = 0; i < multipart.getCount(); i++) {
+                if (multipart.getBodyPart(i + removedParts).getContentType().startsWith("text/html")) {
+                    multipart.removeBodyPart(i + removedParts);
+                    removedParts++;
+                    mutated = true;
+                }
+            }
+        } else {
+            for (int i = 0; i < multipart.getCount(); i++) {
+                mutated = removeHtmlFromMultipartAlternative(multipart.getBodyPart(i));
+            }
+        }
+        return mutated;
+    }
+
+    // true if the message content is mutated
+    private boolean convertRemainingHtmlToPlainText(MimeMessage mimeMessage) throws IOException, MessagingException {
+        if (mimeMessage.getContentType().startsWith("text/html")) {
+            mimeMessage.setContent(htmlTextExtractor.toPlainText(IOUtils.toString(mimeMessage.getInputStream())), "text/plain");
+            return true;
+        }
+        if (mimeMessage.getContent() instanceof Multipart) {
+            boolean mutated = false;
+            Multipart multipart = (Multipart) mimeMessage.getContent();
+            for (int i = 0; i < multipart.getCount(); i++) {
+                mutated = convertRemainingHtmlToPlainText(multipart.getBodyPart(i));
+            }
+            return mutated;
+        }
+        return false;
+    }
+
+    // true if the message content is mutated
+    private boolean convertRemainingHtmlToPlainText(BodyPart bodyPart) throws IOException, MessagingException {
+        if (bodyPart.getContent() instanceof Multipart) {

Review Comment:
   This if clause and the one in line 112 are exactly the same code inside. Extracting it into a private method maybe would be better? 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@james.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@james.apache.org
For additional commands, e-mail: notifications-help@james.apache.org