You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@ponymail.apache.org by hu...@apache.org on 2020/08/14 11:36:49 UTC

[incubator-ponymail-foal] branch master updated (e679d97 -> f71908c)

This is an automated email from the ASF dual-hosted git repository.

humbedooh pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/incubator-ponymail-foal.git.


    from e679d97  remove pip, it happens automagically apparently. Add a noop main script
     new cc81139  Commit initial archiver plus setup scripts for Foal.
     new f71908c  Switch to mypy testing

The 2 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.


Summary of changes:
 .travis.yml                 |   5 +-
 tools/archiver.py           | 905 ++++++++++++++++++++++++++++++++++++++++++++
 tools/plugins/generators.py | 400 ++++++++++++++++++++
 tools/ponymail.yaml         |  44 +++
 tools/setup.py              | 481 +++++++++++++++++++++++
 5 files changed, 1834 insertions(+), 1 deletion(-)
 create mode 100755 tools/archiver.py
 create mode 100644 tools/plugins/generators.py
 create mode 100644 tools/ponymail.yaml
 create mode 100755 tools/setup.py

[incubator-ponymail-foal] 01/02: Commit initial archiver plus setup scripts for Foal.

Posted by hu...@apache.org.

This is an automated email from the ASF dual-hosted git repository.

humbedooh pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/incubator-ponymail-foal.git

commit cc811396844ffb7eb72ba9ee1bab3f6a331d93f2
Author: Daniel Gruno <hu...@apache.org>
AuthorDate: Fri Aug 14 13:34:54 2020 +0200

    Commit initial archiver plus setup scripts for Foal.
    
    This includes the proposed DKIM generator, with unit tests for it.
    Also has a sample ponymail.yaml config file, so tests will work.
    All scripts have been heavily cleaned up and bug-fixed, but are not
    production ready just yet (and the UI is still missing!).
---
 tools/archiver.py           | 905 ++++++++++++++++++++++++++++++++++++++++++++
 tools/plugins/generators.py | 400 ++++++++++++++++++++
 tools/ponymail.yaml         |  44 +++
 tools/setup.py              | 481 +++++++++++++++++++++++
 4 files changed, 1830 insertions(+)

diff --git a/tools/archiver.py b/tools/archiver.py
new file mode 100755
index 0000000..216dffa
--- /dev/null
+++ b/tools/archiver.py
@@ -0,0 +1,905 @@
+#!/usr/bin/env python3
+# -*- coding: utf-8 -*-
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+""" Publish notifications about mails to pony mail.
+
+Copy this file to $mailman_plugin_dir/mailman_ponymail/__init__.py
+Also copy ponymail.cfg to that dir.
+Enable the module by adding the following to your mailman.cfg file::
+
+[archiver.ponymail]
+# The class implementing the IArchiver interface.
+class: mailman_ponymail_plugin.Archiver
+enable: yes
+
+and by adding the following to ponymail.cfg:
+
+[mailman]
+plugin: true
+
+OR, to use the STDIN version (non-MM3 mailing list managers),
+sub someone to the list(s) and add this to their .forward file:
+"|/usr/bin/env python3 /path/to/archiver.py"
+
+"""
+
+import argparse
+import base64
+import collections
+import email.header
+import email.utils
+import fnmatch
+import hashlib
+import json
+import logging
+import os
+import re
+import sys
+import time
+import traceback
+import typing
+import uuid
+
+import certifi
+import chardet
+import elasticsearch
+import formatflowed
+import netaddr
+import yaml
+
+import plugins.generators
+
+# Fetch config from same dir as archiver.py
+config_path = os.path.join(os.path.dirname(os.path.realpath(__file__)), "ponymail.yaml")
+config = yaml.safe_load(open(config_path))
+
+# Set some vars before we begin
+archiver_generator = config["archiver"].get(
+    "generator", "full"
+)  # Fall back to full hashing if nothing is set.
+logger = None
+ES_MAJOR = elasticsearch.VERSION[0]
+auth = None
+if config["elasticsearch"].get("user"):
+    auth = (
+        config["elasticsearch"].get("user"),
+        config["elasticsearch"].get("password"),
+    )
+
+# If MailMan is enabled, import and set it up
+if config.get("mailman") and config["mailman"].get("plugin"):
+    from mailman.interfaces.archiver import ArchivePolicy, IArchiver
+    from zope.interface import implementer
+
+    logger = logging.getLogger("mailman.archiver")
+
+# Access URL once archived
+aURL = config.get("archiver", {}).get("baseurl")
+
+
+def encode_base64(buff: bytes) -> str:
+    """ Convert bytes to base64 as text string (no newlines) """
+    return base64.standard_b64encode(buff).decode("ascii", "ignore")
+
+
+def parse_attachment(
+    part: email.message.Message,
+) -> typing.Tuple[typing.Optional[dict], typing.Optional[str]]:
+    """
+    Parses an attachment in an email, turns it into a dict with a content-type, sha256 digest, file size and file name.
+    Also returns the attachment contents as base64 encoded string.
+    :param part: The message part to parse
+    :return: attachment info and contents as b64 string
+    """
+    cd = part.get("Content-Disposition", None)
+    if cd:
+        # Use str() in case the name is not in ASCII.
+        # In such cases, the get() method returns a Header not a string
+        dispositions = str(cd).strip().split(";")
+        cdtype = dispositions[0].lower()
+        if cdtype == "attachment" or cdtype == "inline":
+            fd = part.get_payload(decode=True)
+            # Allow for empty string
+            if fd is None:
+                return None, None
+            filename = part.get_filename()
+            if filename:
+                attachment = {
+                    "content_type": part.get_content_type(),
+                    "size": len(fd),
+                    "filename": filename,
+                }
+                h = hashlib.sha256(fd).hexdigest()
+                b64 = encode_base64(fd)
+                attachment["hash"] = h
+                return attachment, b64  # Return meta data and contents separately
+    return None, None
+
+
+def pm_charsets(msg: email.message.Message) -> typing.Set[str]:
+    """
+    Figures out and returns all character sets for a message or message part
+    :param msg: The email or message part to analyze
+    :return: all found charsets
+    """
+    charsets = set({})
+    for c in msg.get_charsets():
+        if c is not None:
+            charsets.update([c])
+    return charsets
+
+
+def normalize_lid(lid: str) -> str:  # N.B. Also used by import-mbox.py
+    """ Ensures that a List ID is in standard form, i.e. <a.b.c.d> """
+    # If of format "list name" <foo.bar.baz>
+    # we crop away the description (#511)
+    m = re.match(r'".*"\s+(.+)', lid)
+    if m:
+        lid = m.group(1)
+    # Drop <> and anything before/after, if found
+    m = re.search(r"<(.+)>", lid)
+    if m:
+        lid = m.group(1)
+    # Belt-and-braces: remove possible extraneous chars
+    lid = "<%s>" % lid.strip(" <>").replace("@", ".")
+    if not re.match(r"^<.+\..+>$", lid):
+        print("Invalid list-id %s" % lid)
+        sys.exit(-1)
+    return lid
+
+
+def message_attachments(msg: email.message.Message) -> typing.Tuple[list, dict]:
+    """
+    Parses an email and returns all attachments found as a tuple of metadata and contents
+    :param msg: The email to parse
+    :return: a tuple of attachment metadata and their content
+    """
+    attachments = []
+    contents = {}
+    for part in msg.walk():
+        part_meta, part_file = parse_attachment(part)
+        if part_meta:
+            attachments.append(part_meta)
+            contents[part_meta["hash"]] = part_file
+    return attachments, contents
+
+
+class Archiver(object):  # N.B. Also used by import-mbox.py
+    """The general archiver class. Compatible with MailMan3 archiver classes."""
+
+    if config.get("mailman") and config["mailman"].get("plugin"):
+        implementer(IArchiver)
+
+    # This is a list of headers which are stored in msg_metadata
+    keys = [
+        "archived-at",
+        "delivered-to",
+        "from",
+        "cc",
+        "content-type",
+        "to",
+        "date",
+        "in-reply-to",
+        "message-id",
+        "subject",
+        "references",
+        # The following don't appear to be needed currently
+        "x-message-id-hash",
+        "x-mailman-rule-hits",
+        "x-mailman-rule-misses",
+    ]
+
+    """ Intercept index calls and fix up consistency argument """
+
+    def index(self, **kwargs):
+        if ES_MAJOR in [5, 6, 7]:
+            if kwargs.pop("consistency", None):  # drop the key if present
+                if self.wait_for_active_shards:  # replace with wait if defined
+                    kwargs["wait_for_active_shards"] = self.wait_for_active_shards
+        return self.es.index(**kwargs)
+
+    def __init__(self, generator="full", parse_html=False, dump_dir=None):
+        """ Just initialize ES. """
+        self.html = parse_html
+        self.generator = generator
+        if parse_html:
+            import html2text
+
+            self.html2text = html2text.html2text
+        self.dbname = config["elasticsearch"].get("dbname", "ponymail")
+        ssl = config["elasticsearch"].get("ssl", False)
+        # Always allow this to be set; will be replaced as necessary by wait_for_active_shards
+        self.consistency = config["elasticsearch"].get("write", "quorum")
+        if ES_MAJOR == 2:
+            pass
+        elif ES_MAJOR in [5, 6, 7]:
+            self.wait_for_active_shards = config["elasticsearch"].get("wait", 1)
+        else:
+            raise Exception("Unexpected elasticsearch version ", elasticsearch.VERSION)
+        self.cropout = config.get("debug", {}).get("cropout")
+        uri = config["elasticsearch"].get("uri", "")
+
+        dbs = [
+            {
+                "host": config["elasticsearch"]["hostname"],
+                "port": config["elasticsearch"]["port"],
+                "use_ssl": ssl,
+                "url_prefix": uri,
+                "http_auth": auth,
+                "ca_certs": certifi.where(),
+            }
+        ]
+        # Backup ES?
+        backup = config["elasticsearch"].get("backup")
+        if backup:
+            dbs.append(
+                {
+                    "host": backup,
+                    "port": config["elasticsearch"]["port"],
+                    "use_ssl": ssl,
+                    "url_prefix": uri,
+                    "http_auth": auth,
+                    "ca_certs": certifi.where(),
+                }
+            )
+        # If we have a dump dir, we can risk failing the connection.
+        if dump_dir:
+            try:
+                self.es = elasticsearch.Elasticsearch(
+                    dbs, max_retries=5, retry_on_timeout=True
+                )
+            except elasticsearch.exceptions.ElasticsearchException as e:
+                print(e)
+                print(
+                    "ES connection failed, but dumponfail specified, dumping to %s"
+                    % dump_dir
+                )
+        else:
+            self.es = elasticsearch.Elasticsearch(
+                dbs, max_retries=5, retry_on_timeout=True
+            )
+
+    def message_body(self, msg: email.message.Message, verbose=False, ignore_body=None):
+        body = None
+        first_html = None
+        for part in msg.walk():
+            # can be called from importer
+            if verbose:
+                print("Content-Type: %s" % part.get_content_type())
+            """
+                Find the first body part and the first HTML part
+                Note: cannot use break here because firstHTML is needed if len(body) <= 1
+            """
+            try:
+                if not body and part.get_content_type() == "text/plain":
+                    body = part.get_payload(decode=True)
+                if not body and part.get_content_type() == "text/enriched":
+                    body = part.get_payload(decode=True)
+                elif (
+                    self.html
+                    and not first_html
+                    and part.get_content_type() == "text/html"
+                ):
+                    first_html = part.get_payload(decode=True)
+            except Exception as err:
+                print(err)
+
+        # this requires a GPL lib, user will have to install it themselves
+        if first_html and (
+            not body
+            or len(body) <= 1
+            or (ignore_body and str(body).find(str(ignore_body)) != -1)
+        ):
+            body = self.html2text(
+                first_html.decode("utf-8", "ignore")
+                if type(first_html) is bytes
+                else first_html
+            )
+
+        # See issue#463
+        # This code will try at most one charset
+        # If the decode fails, it will use utf-8
+        if body is not None:
+            for charset in pm_charsets(msg):
+                try:
+                    body = body.decode(charset) if type(body) is bytes else body
+                    # at this point body can no longer be bytes
+                except UnicodeDecodeError:
+                    body = (
+                        body.decode("utf-8", errors="replace")
+                        if type(body) is bytes
+                        else body
+                    )
+                    # at this point body can no longer be bytes
+
+        return body
+
+    # N.B. this is also called by import-mbox.py
+    def compute_updates(
+        self,
+        args,
+        lid: typing.Optional[str],
+        private: bool,
+        msg: email.message.Message,
+        raw_msg: bytes,
+    ) -> typing.Tuple[typing.Optional[dict], dict, dict, typing.Optional[str]]:
+        """Determine what needs to be sent to the archiver.
+        :param args: Command line arguments for the archiver
+        :param lid: The list id
+        :param private: Whether privately archived email or not (bool)
+        :param msg: The message object
+        :param raw_msg: The raw message bytes
+
+        :return None if the message could not be parsed, otherwise a four-tuple consisting of:
+                the digested email as a dict, its attachments, its metadata fields and any
+                in-reply-to data found.
+        """
+
+        if not lid:
+            lid = normalize_lid(msg.get("list-id"))
+        if self.cropout:
+            crops = self.cropout.split(" ")
+            # Regex replace?
+            if len(crops) == 2:
+                lid = re.sub(crops[0], crops[1], lid)
+            # Standard crop out?
+            else:
+                lid = lid.replace(self.cropout, "")
+
+        def default_empty_string(value):
+            return value and str(value) or ""
+
+        msg_metadata = dict([(k, default_empty_string(msg.get(k))) for k in self.keys])
+        mid = (
+            hashlib.sha224(
+                str("%s-%s" % (lid, msg_metadata["archived-at"])).encode("utf-8")
+            ).hexdigest()
+            + "@"
+            + (lid if lid else "none")
+        )
+        for key in ["to", "from", "subject", "message-id"]:
+            try:
+                hval = ""
+                if msg_metadata.get(key):
+                    for t in email.header.decode_header(msg_metadata[key]):
+                        if t[1] is None or t[1].find("8bit") != -1:
+                            hval += str(
+                                t[0].decode("utf-8") if type(t[0]) is bytes else t[0]
+                            )
+                        else:
+                            hval += t[0].decode(t[1], errors="ignore")
+                    msg_metadata[key] = hval
+            except Exception as err:
+                print("Could not decode headers, ignoring..: %s" % err)
+        message_date = None
+        try:
+            message_date = email.utils.parsedate_tz(str(msg_metadata.get("date")))
+        except ValueError:
+            pass
+        if not message_date and msg_metadata.get("archived-at"):
+            message_date = email.utils.parsedate_tz(
+                str(msg_metadata.get("archived-at"))
+            )
+
+        if not message_date:
+            print(
+                "Date (%s) seems totally wrong, using current UNIX epoch instead."
+                % message_date
+            )
+            epoch = time.time()
+        else:
+            epoch = email.utils.mktime_tz(message_date)
+        # message_date calculations are all done, prepare the index entry
+        date_as_string = time.strftime("%Y/%m/%d %H:%M:%S", time.gmtime(epoch))
+        body = self.message_body(msg, verbose=args.verbose, ignore_body=args.ibody)
+        try:
+            if (
+                msg_metadata.get("content-type")
+                and msg_metadata.get("content-type", "").find("flowed") != -1
+            ):
+                body = formatflowed.convertToWrapped(
+                    bytes(body, "utf-8"), character_set="utf-8"
+                )
+            if isinstance(body, str):
+                body = body.encode("utf-8")
+        except UnicodeEncodeError:
+            try:
+                body = body.decode(chardet.detect(body)["encoding"])
+            except UnicodeDecodeError:
+                try:
+                    body = body.decode("latin-1")
+                except UnicodeDecodeError:
+                    try:
+                        if isinstance(body, str):
+                            body = body.encode("utf-8")
+                    except UnicodeEncodeError:
+                        body = None
+
+        attachments, contents = message_attachments(msg)
+        irt = ""
+
+        output_json = None
+
+        if body is not None or attachments:
+            pmid = mid
+            try:
+                mid = plugins.generators.generate(
+                    archiver_generator, msg, body, lid, attachments, raw_msg
+                )
+            except Exception as err:
+                if logger:
+                    # N.B. use .get just in case there is no message-id
+                    logger.warning(
+                        "Could not generate MID: %s. MSGID: %s",
+                        err,
+                        msg_metadata.get("message-id", "?"),
+                    )
+                mid = pmid
+
+            if "in-reply-to" in msg_metadata:
+                try:
+                    irt_original = msg_metadata["in-reply-to"]
+                    if isinstance(irt_original, list):
+                        irt = "".join(irt_original)
+                    else:
+                        irt = str(irt_original)
+                    if irt:
+                        irt = irt.strip()
+                except ValueError:
+                    irt = ""
+            output_json = {
+                "from_raw": msg_metadata["from"],
+                "from": msg_metadata["from"],
+                "to": msg_metadata["to"],
+                "subject": msg_metadata["subject"],
+                "message-id": msg_metadata["message-id"],
+                "mid": mid,
+                "cc": msg_metadata.get("cc"),
+                "epoch": epoch,
+                "list": lid,
+                "list_raw": lid,
+                "date": date_as_string,
+                "private": private,
+                "references": msg_metadata["references"],
+                "in-reply-to": irt,
+                "body": body.decode("utf-8", "replace")
+                if type(body) is bytes
+                else body,
+                "attachments": attachments,
+            }
+
+        return output_json, contents, msg_metadata, irt
+
+    def archive_message(self, args, mlist, msg, raw_message):
+        """Send the message to the archiver.
+
+        :param args: Command line args (verbose, ibody)
+        :param mlist: The IMailingList object.
+        :param msg: The message object.
+        :param raw_message: Raw message bytes
+
+        :return (lid, mid)
+        """
+
+        lid = normalize_lid(mlist.list_id)
+
+        private = False
+        if hasattr(mlist, "archive_public") and mlist.archive_public is True:
+            private = False
+        elif hasattr(mlist, "archive_public") and mlist.archive_public is False:
+            private = True
+        elif (
+            hasattr(mlist, "archive_policy")
+            and mlist.archive_policy is not ArchivePolicy.public
+        ):
+            private = True
+
+        ojson, contents, msg_metadata, irt = self.compute_updates(
+            args, lid, private, msg, raw_message
+        )
+        sha3 = hashlib.sha3_256(raw_message).hexdigest()
+        if not ojson:
+            _id = msg.get("message-id") or msg.get("Subject") or msg.get("Date")
+            raise Exception("Could not parse message %s for %s" % (_id, lid))
+
+        if args.dry:
+            print("**** Dry run, not saving message to database *****")
+            return lid, ojson["mid"]
+
+        try:
+            if contents:
+                for key in contents:
+                    self.index(
+                        index=self.dbname + "-attachment",
+                        id=key,
+                        body={"source": contents[key]},
+                    )
+
+            self.index(
+                index=self.dbname + "-mbox",
+                id=ojson["mid"],
+                consistency=self.consistency,
+                body=ojson,
+            )
+
+            self.index(
+                index=self.dbname + "-source",
+                id=sha3,
+                consistency=self.consistency,
+                body={
+                    "message-id": msg_metadata["message-id"],
+                    "permalink": ojson["mid"],
+                    "source": self.mbox_source(raw_message),
+                },
+            )
+        # If we have a dump dir and ES failed, push to dump dir instead as a JSON object
+        # We'll leave it to another process to pick up the slack.
+        except Exception as err:
+            print(err)
+            if args.dump:
+                print(
+                    "Pushing to ES failed, but dumponfail specified, dumping JSON docs"
+                )
+                uid = uuid.uuid4()
+                mbox_path = os.path.join(args.dump, "%s.json" % uid)
+                with open(mbox_path, "w") as f:
+                    json.dump(
+                        {
+                            "id": ojson["mid"],
+                            "mbox": ojson,
+                            "mbox_source": {
+                                "id": sha3,
+                                "permalink": ojson["mid"],
+                                "message-id": msg_metadata["message-id"],
+                                "source": self.mbox_source(raw_message),
+                            },
+                            "attachments": contents,
+                        },
+                        f,
+                        indent=2,
+                    )
+                    f.close()
+                sys.exit(0)  # We're exiting here, the rest can't be done without ES
+            # otherwise fail as before
+            raise err
+
+        # If MailMan and list info is present, save/update it in ES:
+        if (
+            hasattr(mlist, "description")
+            and hasattr(mlist, "list_name")
+            and mlist.description
+            and mlist.list_name
+        ):
+            self.index(
+                index=self.dbname + "-mailinglist",
+                id=lid,
+                consistency=self.consistency,
+                body={
+                    "list": lid,
+                    "name": mlist.list_name,
+                    "description": mlist.description,
+                    "private": private,
+                },
+            )
+
+        if logger:
+            logger.info("Pony Mail archived message %s successfully", ojson["mid"])
+        oldrefs = []
+
+        # Is this a direct reply to a pony mail email?
+        if irt != "":
+            dm = re.search(r"pony-([a-f0-9]+)-([a-f0-9]+)@", irt)
+            if dm:
+                cid = dm.group(1)
+                mid = dm.group(2)
+                if self.es.exists(index=self.dbname, doc_type="account", id=cid):
+                    doc = self.es.get(index=self.dbname, doc_type="account", id=cid)
+                    if doc:
+                        oldrefs.append(cid)
+                        # N.B. no index is supplied, so ES will generate one
+                        self.index(
+                            index=self.dbname + "-notification",
+                            consistency=self.consistency,
+                            body={
+                                "type": "direct",
+                                "recipient": cid,
+                                "list": lid,
+                                "private": private,
+                                "date": ojson["date"],
+                                "from": msg_metadata["from"],
+                                "to": msg_metadata["to"],
+                                "subject": msg_metadata["subject"],
+                                "message-id": msg_metadata["message-id"],
+                                "in-reply-to": irt,
+                                "epoch": ojson["epoch"],
+                                "mid": mid,
+                                "seen": 0,
+                            },
+                        )
+                        if logger:
+                            logger.info("Notification sent to %s for %s", cid, mid)
+
+        # Are there indirect replies to pony emails?
+        if msg_metadata.get("references"):
+            for im in re.finditer(
+                r"pony-([a-f0-9]+)-([a-f0-9]+)@", msg_metadata.get("references")
+            ):
+                cid = im.group(1)
+                mid = im.group(2)
+                if self.es.exists(index=self.dbname, doc_type="account", id=cid):
+                    doc = self.es.get(index=self.dbname, doc_type="account", id=cid)
+
+                    # does the user want to be notified of indirect replies?
+                    if (
+                        doc
+                        and "preferences" in doc["_source"]
+                        and doc["_source"]["preferences"].get("notifications")
+                        == "indirect"
+                        and cid not in oldrefs
+                    ):
+                        oldrefs.append(cid)
+                        # N.B. no index is supplied, so ES will generate one
+                        self.index(
+                            index=self.dbname,
+                            consistency=self.consistency,
+                            doc_type="notifications",
+                            body={
+                                "type": "indirect",
+                                "recipient": cid,
+                                "list": lid,
+                                "private": private,
+                                "date": ojson["date"],
+                                "from": msg_metadata["from"],
+                                "to": msg_metadata["to"],
+                                "subject": msg_metadata["subject"],
+                                "message-id": msg_metadata["message-id"],
+                                "in-reply-to": irt,
+                                "epoch": ojson["epoch"],
+                                "mid": mid,
+                                "seen": 0,
+                            },
+                        )
+                        if logger:
+                            logger.info("Notification sent to %s for %s", cid, mid)
+        return lid, ojson["mid"]
+
+    def mbox_source(self, b: bytes) -> str:
+        # Common method shared with import-mbox
+        try:
+            # Can we store as ASCII?
+            return b.decode("ascii", errors="strict")
+        except UnicodeError:
+            # No, so must use base64 to avoid corruption on re-encoding
+            return encode_base64(b)
+
+    def list_url(self, _mlist):
+        """ Required by MM3 plugin API
+        """
+        return None
+
+    def permalink(self, _mlist, _msg):
+        """ Required by MM3 plugin API
+        """
+        return None
+
+
+def main():
+    parser = argparse.ArgumentParser(description="Command line options.")
+    parser.add_argument(
+        "--lid", dest="lid", type=str, nargs=1, help="Alternate specific list ID"
+    )
+    parser.add_argument(
+        "--altheader",
+        dest="altheader",
+        type=str,
+        nargs=1,
+        help="Alternate header for list ID",
+    )
+    parser.add_argument(
+        "--allowfrom",
+        dest="allowfrom",
+        type=str,
+        nargs=1,
+        help="(optional) source IP (mail server) to allow posts from, ignore if no match",
+    )
+    parser.add_argument(
+        "--ignore",
+        dest="ignorefrom",
+        type=str,
+        nargs=1,
+        help="Sender/list to ignore input from (owner etc)",
+    )
+    parser.add_argument(
+        "--private",
+        dest="private",
+        action="store_true",
+        help="This is a private archive",
+    )
+    parser.add_argument(
+        "--makedate",
+        dest="makedate",
+        action="store_true",
+        help="Use the archive timestamp as the email date instead of the Date header",
+    )
+    parser.add_argument(
+        "--quiet",
+        dest="quiet",
+        action="store_true",
+        help="Do not exit -1 if the email could not be parsed",
+    )
+    parser.add_argument(
+        "--verbose",
+        dest="verbose",
+        action="store_true",
+        help="Output additional log messages",
+    )
+    parser.add_argument(
+        "--html2text",
+        dest="html2text",
+        action="store_true",
+        help="Try to convert HTML to text if no text/plain message is found",
+    )
+    parser.add_argument(
+        "--dry",
+        dest="dry",
+        action="store_true",
+        help="Do not save emails to elasticsearch, only test parsing",
+    )
+    parser.add_argument(
+        "--ignorebody",
+        dest="ibody",
+        type=str,
+        nargs=1,
+        help="Optional email bodies to treat as empty (in conjunction with --html2text)",
+    )
+    parser.add_argument(
+        "--dumponfail",
+        dest="dump",
+        help="If pushing to ElasticSearch fails, dump documents in JSON format to this directory and "
+        "fail silently.",
+    )
+    parser.add_argument("--generator", dest="generator", help="Override the generator.")
+    args = parser.parse_args()
+
+    if args.verbose:
+        logging.basicConfig(stream=sys.stdout, level=logging.INFO)
+    else:
+        # elasticsearch logs lots of warnings on retries/connection failure
+        # Also eliminates: 'Undecodable raw error response from server:' warning message
+        logging.getLogger("elasticsearch").setLevel(logging.ERROR)
+
+    archie = Archiver(
+        generator=args.generator or archiver_generator, parse_html=args.html2text
+    )
+    # use binary input so parser can use appropriate charset
+    input_stream = sys.stdin.buffer
+
+    try:
+        raw_message = input_stream.read()
+        try:
+            msg = email.message_from_bytes(raw_message)
+        except Exception as err:
+            print("STDIN parser exception: %s" % err)
+            sys.exit(-1)
+
+        if args.altheader:
+            alt_header = args.altheader[0]
+            if alt_header in msg:
+                try:
+                    msg.replace_header("List-ID", msg.get(alt_header))
+                except KeyError:
+                    msg.add_header("list-id", msg.get(alt_header))
+        elif "altheader" in sys.argv:
+            alt_header = sys.argv[len(sys.argv) - 1]
+            if alt_header in msg:
+                try:
+                    msg.replace_header("List-ID", msg.get(alt_header))
+                except KeyError:
+                    msg.add_header("list-id", msg.get(alt_header))
+
+        # Set specific LID?
+        if args.lid and len(args.lid[0]) > 3:
+            try:
+                msg.replace_header("List-ID", args.lid[0])
+            except KeyError:
+                msg.add_header("list-id", args.lid[0])
+
+        # Ignore based on --ignore flag?
+        if args.ignorefrom:
+            ignore_from = args.ignorefrom[0]
+            if fnmatch.fnmatch(msg.get("from"), ignore_from) or (
+                msg.get("list-id") and fnmatch.fnmatch(msg.get("list-id"), ignore_from)
+            ):
+                print("Ignoring message as instructed by --ignore flag")
+                sys.exit(0)
+
+        # Check CIDR if need be
+        if args.allowfrom:
+
+            c = netaddr.IPNetwork(args.allowfrom[0])
+            good = False
+            for line in msg.get_all("received") or []:
+                m = re.search(r"from .+\[(.+)]", line)
+                if m:
+                    try:
+                        ip = netaddr.IPAddress(m.group(1))
+                        if ip in c:
+                            good = True
+                            msg.add_header("ip-whitelisted", "yes")
+                            break
+                    except ValueError:
+                        pass
+                    except netaddr.AddrFormatError:
+                        pass
+            if not good:
+                print("No whitelisted IP found in message, aborting")
+                sys.exit(-1)
+        # Replace date header with $now?
+        if args.makedate:
+            msg.replace_header("date", email.utils.formatdate())
+        is_public = True
+        if args.private:
+            is_public = False
+        if "list-id" in msg:
+            if not msg.get("archived-at"):
+                msg.add_header("archived-at", email.utils.formatdate())
+            list_data = collections.namedtuple(
+                "importmsg",
+                [
+                    "list_id",
+                    "archive_public",
+                    "archive_policy",
+                    "list_name",
+                    "description",
+                ],
+            )(
+                list_id=msg.get("list-id"),
+                archive_public=is_public,
+                archive_policy=None,
+                list_name=msg.get("list-id"),
+                description=msg.get("list-id"),
+            )
+
+            try:
+                lid, mid = archie.archive_message(args, list_data, msg, raw_message)
+                print(
+                    "%s: Done archiving to %s as %s!"
+                    % (email.utils.formatdate(), lid, mid)
+                )
+            except Exception as err:
+                if args.verbose:
+                    traceback.print_exc()
+                print("Archiving failed!: %s" % err)
+                raise Exception("Archiving to ES failed")
+        else:
+            print("Nothing to import (no list-id found!)")
+    except Exception as err:
+        # extract the len number without using variables (which may cause issues?)
+        #                           last traceback    1st entry, 2nd field
+        line = traceback.extract_tb(sys.exc_info()[2])[0][1]
+        if args.quiet:
+            print(
+                "Could not parse email, but exiting quietly as --quiet is on: %s (@ %s)"
+                % (err, line)
+            )
+        else:
+            print("Could not parse email: %s (@ %s)" % (err, line))
+            sys.exit(-1)
+
+
+if __name__ == "__main__":
+    main()
diff --git a/tools/plugins/generators.py b/tools/plugins/generators.py
new file mode 100644
index 0000000..72e768c
--- /dev/null
+++ b/tools/plugins/generators.py
@@ -0,0 +1,400 @@
+#!/usr/bin/env python3
+# -*- coding: utf-8 -*-
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+"""
+This file contains the various ID generators for Pony Mail's archivers.
+"""
+
+import base64
+import hashlib
+import email.utils
+import time
+import re
+import typing
+
+# For optional nonce
+config: typing.Optional[dict] = None
+
+# Headers from RFC 4871, the precursor to RFC 6376
+rfc4871_subset = {
+    "from", "sender", "reply-to", "subject", "date", "message-id",
+    "to", "cc", "mime-version", "content-type",
+    "content-transfer-encoding", "content-id", "content-description",
+    "resent-date", "resent-from", "resent-sender", "resent-to",
+    "resent-cc", "resent-message-id", "in-reply-to", "references",
+    "list-id", "list-help", "list-unsubscribe", "list-subscribe",
+    "list-post", "list-owner", "list-archive", "dkim-signature"
+}
+
+# Authenticity headers from RFC 8617
+rfc4871_and_rfc8617_subset = rfc4871_subset | {
+    "arc-authentication-results", "arc-message-signature", "arc-seal"
+}
+
+
+def rfc822_parse_dkim(suffix,
+                      head_canon=False, body_canon=False,
+                      head_subset=None, archive_list_id=None):
+    headers = []
+    keep = True
+    list_ids = set()
+
+    while suffix:
+        # Edge case: headers don't end LF (add LF)
+        line, suffix = (suffix.split(b"\n", 1) + [b""])[:2]
+        if line in {b"\r", b""}:
+            break
+        end = b"\n" if line.endswith(b"\r") else b"\r\n"
+        if line[0] in {0x09, 0x20}:
+            # Edge case: starts with a continuation (treat like From)
+            if headers and (keep is True):
+                headers[-1][1] += line + end
+        elif not line.startswith(b"From "):
+            # Edge case: header start contains no colon (use whole line)
+            # "A field-name MUST be contained on one line." (RFC 822 B.2)
+            k, v = (line.split(b":", 1) + [b""])[:2]
+            k_lower = k.lower()
+            if k_lower == "list-id":
+                list_ids.add(k_lower)
+            if (head_subset is None) or (k_lower in head_subset):
+                keep = True
+                headers.append([k, v + end])
+            else:
+                keep = False
+    # The remaining suffix is the body
+    body = suffix.replace(b"\r\n", b"\n")
+    body = body.replace(b"\n", b"\r\n")
+
+    # Optional X-Archive-List-ID augmentation
+    if (archive_list_id is not None) and (archive_list_id not in list_ids):
+        xali_value = b" " + bytes(archive_list_id, "ascii")
+        headers.append([b"X-Archive-List-ID", xali_value])
+    # Optional nonce from local config
+    if config is not None:
+        if (config.get("archiver") and
+                config['archiver'].get('nonce')):
+            nonce = config['archiver'].get('nonce')
+            headers.append([b"X-Archive-Nonce", nonce])
+    # Optional head canonicalisation (DKIM relaxed)
+    if head_canon is True:
+        for i in range(len(headers)):
+            k, v = headers[i]
+            crlf = v.endswith(b"\r\n")
+            if crlf is True:
+                v = v[:-2]
+            v = v.replace(b"\r\n", b"")
+            v = v.replace(b"\t", b" ")
+            v = v.strip(b" ")
+            v = b" ".join(vv for vv in v.split(b" ") if vv)
+            if crlf is True:
+                v = v + b"\r\n"
+            headers[i] = [k.lower(), v]
+    # Optional body canonicalisation (DKIM simple)
+    if body_canon is True:
+        while body.endswith(b"\r\n\r\n"):
+            body = body[:-2]
+    return (headers, body)
+
+
+def pibble(hashable, size=10):
+    table = bytes.maketrans(
+        b"ABCDEFGHIJKLMNOPQRSTUVWXYZ234567",
+        b"0123456789bcdfghjklmnopqrstvwxyz",
+    )
+    digest = hashlib.sha3_256(hashable).digest()
+    prefix = digest[:size]
+    encoded = base64.b32encode(prefix)
+    return str(encoded.translate(table), "ascii")
+
+
+# DKIM generator: uses DKIM canonicalisation
+# Used by default
+def dkim(_msg, _body, lid, _attachments, raw_msg):
+    """
+    DKIM generator: uses DKIM relaxed/simple canonicalisation
+    We use the headers recommended in RFC 4871, plus DKIM-Signature
+
+    Parameters:
+    _msg - the parsed message (not used)
+    _body - the parsed text content (not used)
+    lid - list id
+    _attachments - list of attachments (not used)
+    raw_msg - the original message bytes
+
+    Returns: str "<pibble>", a sixteen char custom base32 encoded hash
+    """
+    headers, body = rfc822_parse_dkim(raw_msg,
+                                      head_canon=True, body_canon=True,
+                                      head_subset=rfc4871_subset, archive_list_id=lid)
+    hashable = b"".join([h for header in headers for h in header])
+    if body:
+        hashable += b"\r\n" + body
+    # The pibble is the 80-bit SHA3-256 prefix
+    # It is base32 encoded using 0-9 a-z except [aeiu]
+    return pibble(hashable)
+
+
+# Full generator: uses the entire email (including server-dependent data)
+# Used by default until August 2020.
+# See 'dkim' for recommended generation.
+def full(msg, _body, lid, _attachments, _raw_msg):
+    """
+    Full generator: uses the entire email
+    (including server-dependent data)
+    The id is almost certainly unique,
+    but different copies of the message are likely to have different headers, thus ids
+
+    Parameters:
+    msg - the parsed message
+    _body - the parsed text content (not used)
+    lid - list id
+    _attachments - list of attachments (not used)
+    _raw_msg - the original message bytes (not used)
+
+    Returns: "<hash>@<lid>" where hash is sha224 of message bytes
+    """
+    mid = "%s@%s" % (hashlib.sha224(msg.as_bytes()).hexdigest(), lid)
+    return mid
+
+
+# Medium: Standard 0.9 generator - Not recommended for future installations.
+# See 'full' or 'cluster' generators instead.
+def medium(msg, body, lid, _attachments, _raw_msg):
+    """
+    Standard 0.9 generator - Not recommended for future installations.
+    (does not generate sufficiently unique ids)
+    Also the lid is included in the hash; this causes problems if the listname needs to be changed.
+
+    N.B. The id is not guaranteed stable - i.e. it may change if the message is reparsed.
+    The id depends on the parsed body, which depends on the exact method used to parse the mail.
+    For example, are invalid characters ignored or replaced; is html parsing used?
+
+    The following message fields are concatenated to form the hash input:
+    - body: if bytes as is else encoded ascii, ignoring invalid characters; if the body is null an Exception is thrown
+    - lid
+    - Date header if it exists and parses OK; failing that
+    - archived-at header if it exists and parses OK; failing that
+    - current time.
+    The resulting date is converted to YYYY/MM/DD HH:MM:SS (using UTC)
+
+    Parameters:
+    msg - the parsed message (used to get the date)
+    body - the parsed text content (may be null)
+    lid - list id
+    _attachments - list of attachments (not used)
+    _raw_msg - the original message bytes (not used)
+
+    Returns: "<hash>@<lid>" where hash is sha224 of the message items noted above
+    """
+
+    # Use text body
+    xbody = body if type(body) is bytes else body.encode('ascii', 'ignore')
+    # Use List ID
+    xbody += bytes(lid, encoding='ascii')
+    # Use Date header
+    try:
+        mdate = email.utils.parsedate_tz(msg.get('date'))
+    except:
+        pass
+    # In keeping with preserving the past, we have kept this next section(s).
+    # For all intents and purposes, this is not a proper way of maintaining
+    # a consistent ID in case of missing dates. It is recommended to use
+    # another generator
+    if not mdate and msg.get('archived-at'):
+        mdate = email.utils.parsedate_tz(msg.get('archived-at'))
+    elif not mdate:
+        mdate = time.gmtime()  # Get a standard 9-tuple
+        mdate = mdate + (0,)  # Fake a TZ (10th element)
+    mdatestring = time.strftime("%Y/%m/%d %H:%M:%S", time.gmtime(email.utils.mktime_tz(mdate)))
+    xbody += bytes(mdatestring, encoding='ascii')
+    mid = "%s@%s" % (hashlib.sha224(xbody).hexdigest(), lid)
+    return mid
+
+
+# Original medium generator used for a while in June 2016
+# Committed: https://gitbox.apache.org/repos/asf?p=incubator-ponymail.git;a=commitdiff;h=aa989610
+# Replaced:  https://gitbox.apache.org/repos/asf?p=incubator-ponymail.git;a=commitdiff;h=4732d25f
+# Currently broken, as it expects a bytestring but gets a string as body (DO NOT USE)
+def medium_original(msg, body, lid, _attachments, _raw_msg):
+    """
+    NOT RECOMMENDED - does not generate sufficiently unique ids
+    Also the lid is included in the hash; this causes problems if the listname needs to be changed.
+
+    The following message fields are concatenated to form the hash input:
+    - body: if bytes as is else encoded ascii, ignoring invalid characters; if the body is null an Exception is thrown
+    - lid
+    - Date header if it exists and parses OK; converted to UTC seconds since the epoch; else 0
+
+    Parameters:
+    msg - the parsed message (used to get the date)
+    body - the parsed text content (may be null)
+    lid - list id
+    _attachments - list of attachments (not used)
+    _raw_msg - the original message bytes (not used)
+
+    Returns: "<hash>@<lid>" where hash is sha224 of the message items noted above
+    """
+
+    # Use text body
+    xbody = body if type(body) is bytes else body.encode('ascii', 'ignore')
+    # Use List ID
+    xbody += lid  # WRONG: Should be: bytes(lid, 'ascii')
+
+    uid_mdate = 0  # mdate for UID generation
+    try:
+        mdate = email.utils.parsedate_tz(msg.get('date'))
+        uid_mdate = email.utils.mktime_tz(mdate)  # Only set if Date header is valid
+    except:
+        pass
+    xbody += bytes(str(uid_mdate), 'ascii')
+    mid = "%s@%s" % (hashlib.sha224(xbody).hexdigest(), lid)
+    return mid
+
+
+# cluster: Use data that is guaranteed to be the same across cluster setups
+# This is the recommended generator for cluster setups.
+# Unlike 'medium', this only makes use of the Date: header and not the archived-at,
+# as the archived-at may change from node to node (and will change if not in the raw mbox file)
+# Also the lid is not included in the hash, so the hash does not change if the lid is overridden
+#
+def cluster(msg, body, lid, attachments, _raw_msg):
+    """
+    Use data that is guaranteed to be the same across cluster setups
+    For mails with a valid Message-ID this is likely to be unique
+    In other cases it is better than the medium generator as it uses several extra fields
+
+    N.B. The id is not guaranteed stable - i.e. it may change if the message is reparsed.
+    The id depends on the parsed body, which depends on the exact method used to parse the mail.
+    For example, are invalid characters ignored or replaced; is html parsing used?
+
+    The following message fields are concatenated to form the hash input:
+    - body as is if bytes else encoded ascii, ignoring invalid characters; if the body is null it is treated as an empty string
+      (currently trailing whitespace is dropped)
+    - Message-ID (if present)
+    - Date header converted to YYYY/MM/DD HH:MM:SS (UTC)
+      or "(null)" if the date does not exist or cannot be converted
+    - sender, encoded as ascii (if the field exists)
+    - subject, encoded as ascii (if the field exists)
+    - the hashes of any attachments
+
+    Note: the lid is not included in the hash.
+
+    Parameters:
+    msg - the parsed message
+    body - the parsed text content
+    lid - list id
+    attachments - list of attachments (uses the hashes)
+    _raw_msg - the original message bytes (not used)
+
+    Returns: "r<hash>@<lid>" where hash is sha224 of the message items noted above
+    """
+    # Use text body
+    if not body:  # Make sure body is not None, which will fail.
+        body = ""
+    xbody = body if type(body) is bytes else body.encode('ascii', 'ignore')
+
+    # Crop out any trailing whitespace in body
+    xbody = re.sub(b"\s+$", b"", xbody)
+
+    # Use Message-Id (or '' if missing)
+    xbody += bytes(msg.get('Message-Id', ''), encoding='ascii')
+
+    # Use Date header. Don't use archived-at, as the archiver sets this if not present.
+    mdate = None
+    mdatestring = "(null)"  # Default to null, ONLY changed if replicable across imports
+    try:
+        mdate = email.utils.parsedate_tz(msg.get('date'))
+        mdatestring = time.strftime("%Y/%m/%d %H:%M:%S", time.gmtime(email.utils.mktime_tz(mdate)))
+    except:
+        pass
+    xbody += bytes(mdatestring, encoding='ascii')
+
+    # Use sender
+    sender = msg.get('from', None)
+    if sender:
+        xbody += bytes(sender, encoding='ascii')
+
+    # Use subject
+    subject = msg.get('subject', None)
+    if subject:
+        xbody += bytes(subject, encoding='ascii')
+
+    # Use attachment hashes if present
+    if attachments:
+        for a in attachments:
+            xbody += bytes(a['hash'], encoding='ascii')
+
+    # generate the hash and combine with the lid to form the id
+    mid = "r%s@%s" % (hashlib.sha224(xbody).hexdigest(), lid)
+    return mid
+
+
+# Old school way of making IDs
+def legacy(msg, body, lid, _attachments, _raw_msg):
+    """
+    Original generator - DO NOT USE
+    (does not generate unique ids)
+
+    The hash input is created from
+    - body: if bytes as is else encoded ascii, ignoring invalid characters; if the body is null an Exception is thrown
+
+    The uid_mdate for the id is the Date converted to UTC epoch else 0
+
+    Parameters:
+    msg - the parsed message (used to get the date)
+    body - the parsed text content (may be null)
+    lid - list id
+    _attachments - list of attachments (not used)
+    _raw_msg - the original message bytes (not used)
+
+    Returns: "<hash>@<uid_mdate>@<lid>" where hash is sha224 of the message items noted above
+    """
+    uid_mdate = 0  # Default if no date found
+    try:
+        mdate = email.utils.parsedate_tz(msg.get('date'))
+        uid_mdate = email.utils.mktime_tz(mdate)  # Only set if Date header is valid
+    except:
+        pass
+    mid = "%s@%s@%s" % (
+    hashlib.sha224(body if type(body) is bytes else body.encode('ascii', 'ignore')).hexdigest(), uid_mdate, lid)
+    return mid
+
+
+__GENERATORS = {
+    'dkim': dkim,
+    'full': full,
+    'medium': medium,
+    'medium_original': medium_original,
+    'cluster': cluster,
+    'legacy': legacy,
+}
+
+
+def generator(name):
+    try:
+        return __GENERATORS[name]
+    except KeyError:
+        print("WARN: generator %s not found, defaulting to 'legacy'" % name)
+        return legacy
+
+
+def generate(name, msg, body, lid, attachments, raw_msg):
+    return generator(name)(msg, body, lid, attachments, raw_msg)
+
+
+def generator_names():
+    return list(__GENERATORS)
diff --git a/tools/ponymail.yaml b/tools/ponymail.yaml
new file mode 100644
index 0000000..2817c6e
--- /dev/null
+++ b/tools/ponymail.yaml
@@ -0,0 +1,44 @@
+---
+###############################################################
+# A ponymail.cfg is needed to run this project. This sample config file was
+# originally generated by tools/setup.py.
+# 
+# Run the tools/setup.py script and a ponymail.cfg which looks a lot like this 
+# one will be generated. If, for whatever reason, that script is not working 
+# for you, you may use this ponymail.cfg as a starting point.
+# 
+# Contributors should strive to keep this sample updated. One way to do this 
+# would be to run the tools/setup.py, rename the generated config to
+# ponymail.cfg.sample, and then pasting this message or a modified form of 
+# this message at the top.
+###############################################################
+
+###############################################################
+# Pony Mail Configuration file
+
+
+
+##############################################################
+# THIS IS AN EXAMPLE FOR TESTING - RUN setup.py PLEASE       #
+##############################################################
+
+# Main ES configuration
+elasticsearch:
+    hostname:               localhost
+    dbname:                 ponymail
+    port:                   9200
+    ssl:                    false
+    #uri:                   url_prefix
+    #user:                  username
+    #password:              password
+    #wait:                  active shard count
+    #backup:                database name
+
+archiver:
+    #generator:             medium|full|cluster|dkim|other (dkim recommended)
+    generator:              dkim
+
+debug:
+    #cropout:               string to crop from list-id
+    # e.g. Strip out incubator except at top level
+    cropout:                (\w+\.\w+)\.incubator\.apache\.org \1.apache.org
diff --git a/tools/setup.py b/tools/setup.py
new file mode 100755
index 0000000..f88c3e6
--- /dev/null
+++ b/tools/setup.py
@@ -0,0 +1,481 @@
+#!/usr/bin/env python3
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import argparse
+import importlib.util
+import logging
+import os
+import os.path
+import shutil
+import sys
+import yaml
+
+if sys.version_info <= (3, 7):
+    print("This script requires Python 3.8 or higher")
+    sys.exit(-1)
+
+# Check for all required python packages
+wanted_pkgs = [
+    "elasticsearch",  # used by setup.py, archiver.py and elastic.py
+    "formatflowed",  # used by archiver.py
+    "netaddr",  # used by archiver.py
+    "certifi",  # used by archiver.py and elastic.py
+]
+
+missing_pkgs = list(wanted_pkgs)  # copy to avoid corruption
+for pkg in wanted_pkgs:
+    if importlib.util.find_spec(pkg):
+        missing_pkgs.remove(pkg)
+
+if missing_pkgs:
+    print("It looks like you need to install some python modules first")
+    print("The following packages are required: ")
+    for pkg in missing_pkgs:
+        print(" - %s" % pkg)
+    print("You may use your package manager, or run the following command:")
+    print("pip3 install %s" % " ".join(missing_pkgs))
+    sys.exit(-1)
+
+
+# at this point we can assume elasticsearch is present
+from elasticsearch import VERSION as ES_VERSION
+from elasticsearch import ConnectionError as ES_ConnectionError
+from elasticsearch import Elasticsearch, ElasticsearchException
+
+ES_MAJOR = ES_VERSION[0]
+
+# CLI arg parsing
+parser = argparse.ArgumentParser(description="Command line options.")
+
+parser.add_argument(
+    "--defaults", dest="defaults", action="store_true", help="Use default settings"
+)
+parser.add_argument("--dbprefix", dest="dbprefix")
+parser.add_argument(
+    "--clobber",
+    dest="clobber",
+    action="store_true",
+    help="Allow overwrite of ponymail.cfg & ../site/api/lib/config.lua (default: create *.tmp if either exists)",
+)
+parser.add_argument("--dbhost", dest="dbhost", type=str, help="ES backend hostname")
+parser.add_argument("--dbport", dest="dbport", type=str, help="DB port")
+parser.add_argument("--dbname", dest="dbname", type=str, help="ES DB name")
+parser.add_argument("--dbshards", dest="dbshards", type=int, help="DB Shard Count")
+parser.add_argument(
+    "--dbreplicas", dest="dbreplicas", type=int, help="DB Replica Count"
+)
+parser.add_argument(
+    "--mailserver",
+    dest="mailserver",
+    type=str,
+    help="Host name of outgoing mail server",
+)
+parser.add_argument(
+    "--mldom", dest="mldom", type=str, help="Domains to accept mail for via UI"
+)
+parser.add_argument(
+    "--wordcloud", dest="wc", action="store_true", help="Enable word cloud"
+)
+parser.add_argument(
+    "--skiponexist",
+    dest="soe",
+    action="store_true",
+    help="Skip setup if ES index exists",
+)
+parser.add_argument(
+    "--noindex",
+    dest="noi",
+    action="store_true",
+    help="Don't make an ES index, assume it exists",
+)
+parser.add_argument(
+    "--nocloud", dest="nwc", action="store_true", help="Do not enable word cloud"
+)
+parser.add_argument(
+    "--generator",
+    dest="generator",
+    type=str,
+    help="Document ID Generator to use (legacy, medium, cluster, full)",
+)
+args = parser.parse_args()
+
+print("Welcome to the Pony Mail setup script!")
+print("Let's start by determining some settings...")
+print("")
+
+
+hostname = ""
+port = 0
+dbname = ""
+mlserver = ""
+mldom = ""
+wc = ""
+genname = ""
+wce = False
+shards = 0
+replicas = -1
+urlPrefix = None
+
+# If called with --defaults (like from Docker), use default values
+if args.defaults:
+    hostname = "localhost"
+    port = 9200
+    dbname = "ponymail"
+    mlserver = "localhost"
+    mldom = "example.org"
+    wc = "Y"
+    wce = True
+    shards = 1
+    replicas = 0
+    genname = "cluster"
+    urlPrefix = ""
+
+# Accept CLI args, copy them
+if args.dbprefix:
+    urlPrefix = args.dbprefix
+if args.dbhost:
+    hostname = args.dbhost
+if args.dbport:
+    port = int(args.dbport)
+if args.dbname:
+    dbname = args.dbname
+if args.mailserver:
+    mlserver = args.mailserver
+if args.mldom:
+    mldom = args.mldom
+if args.wc:
+    wc = args.wc
+if args.nwc:
+    wc = False
+if args.dbshards:
+    shards = args.dbshards
+if args.dbreplicas:
+    replicas = args.dbreplicas
+if args.generator:
+    genname = args.generator
+
+while hostname == "":
+    hostname = input(
+        "What is the hostname of the ElasticSearch server? (e.g. localhost): "
+    )
+
+while urlPrefix == None:
+    urlPrefix = input("Database URL prefix if any (hit enter if none): ")
+
+while port < 1:
+    try:
+        port = int(input("What port is ElasticSearch listening on? (normally 9200): "))
+    except ValueError:
+        pass
+
+while dbname == "":
+    dbname = input("What would you like to call the mail index (e.g. ponymail): ")
+
+while mlserver == "":
+    mlserver = input(
+        "What is the hostname of the outgoing mailserver? (e.g. mail.foo.org): "
+    )
+
+while mldom == "":
+    mldom = input(
+        "Which domains would you accept mail to from web-replies? (e.g. foo.org or *): "
+    )
+
+while wc == "":
+    wc = input("Would you like to enable the word cloud feature? (Y/N): ")
+    if wc.lower() == "y":
+        wce = True
+
+while genname == "":
+    gens = ["legacy", "medium", "cluster", "full", "dkim"]
+    print("Please select a document ID generator:")
+    print(
+        "1  LEGACY: The original document generator for v/0.1-0.8 (no longer recommended)"
+    )
+    print(
+        "2  MEDIUM: The medium comprehensive generator for v/0.9 (no longer recommended)"
+    )
+    print("3  REDUNDANT: Near-full message digest, discard MTA trail")
+    print("4  FULL: Full message digest with MTA trail")
+    print(
+        "5  [RECOMMENDED] DKIM/RFC-6376: Short SHA3 hash useful for cluster setups with permalink usage"
+    )
+    try:
+        gno = int(input("Please select a generator [1-5]: "))
+        if gno <= len(gens) and gens[gno - 1]:
+            genname = gens[gno - 1]
+    except ValueError:
+        pass
+
+if genname == "dkim":
+    print(
+        "DKIM hasher chosen. It is recommended you set a cryptographic nonce for this generator, though not required."
+    )
+    print(
+        "If you set a nonce, you will need this same nonce for future installations if you intend to preserve "
+    )
+    print("permalinks from imported messages.")
+    nonce = (
+        input("Enter your nonce or hit [enter] to continue without a nonce: ") or None
+    )
+
+while shards < 1:
+    try:
+        shards = int(input("How many shards for the ElasticSearch index? "))
+    except ValueError:
+        pass
+
+while replicas < 0:
+    try:
+        replicas = int(input("How many replicas for each shard? "))
+    except ValueError:
+        pass
+
+print("Okay, I got all I need, setting up Pony Mail...")
+
+
+def createIndex():
+    # Check if index already exists
+    if es.indices.exists(dbname + "-mbox"):
+        if args.soe:
+            print(
+                "ElasticSearch indices with prefix '%s' already exists and SOE set, exiting quietly"
+                % dbname
+            )
+            sys.exit(0)
+        else:
+            print("Error: Existing ElasticSearch indices with prefix '%s' already exist!" % dbname)
+            sys.exit(-1)
+
+    print(f"Creating indices {dbname}-*...")
+
+    settings = {"number_of_shards": shards, "number_of_replicas": replicas}
+
+    mappings = {
+        "mbox": {
+            "properties": {
+                "@import_timestamp": {
+                    "type": "date",
+                    "format": "yyyy/MM/dd HH:mm:ss||yyyy/MM/dd",
+                },
+                "attachments": {
+                    "properties": {
+                        "content_type": {"type": "keyword",},
+                        "filename": {"type": "keyword",},
+                        "hash": {"type": "keyword",},
+                        "size": {"type": "long"},
+                    }
+                },
+                "body": {"type": "text"},
+                "cc": {"type": "text"},
+                "date": {
+                    "type": "date",
+                    "store": True,
+                    "format": "yyyy/MM/dd HH:mm:ss",
+                },
+                "epoch": {"type": "long",},  # number of seconds since the epoch
+                "from": {"type": "text"},
+                "from_raw": {"type": "keyword",},
+                "in-reply-to": {"type": "keyword",},
+                "list": {"type": "text"},
+                "list_raw": {"type": "keyword",},
+                "message-id": {"type": "keyword",},
+                "mid": {"type": "keyword"},
+                "private": {"type": "boolean"},
+                "permalink": {"type": "keyword"},
+                "references": {"type": "text"},
+                "subject": {"type": "text", "fielddata": True},
+                "to": {"type": "text"},
+            }
+        },
+        "attachment": {"properties": {"source": {"type": "binary"}}},
+        "source": {
+            "properties": {
+                "source": {"type": "binary"},
+                "message-id": {"type": "keyword",},
+                "permalink": {"type": "keyword"},
+                "mid": {"type": "keyword"},
+            }
+        },
+        "mailinglist": {
+            "properties": {
+                "description": {"type": "keyword",},
+                "list": {"type": "keyword",},
+                "name": {"type": "keyword",},
+            }
+        },
+        "account": {
+            "properties": {
+                "cid": {"type": "keyword",},
+                "credentials": {
+                    "properties": {
+                        "altemail": {"type": "object"},
+                        "email": {"type": "keyword",},
+                        "fullname": {"type": "keyword",},
+                        "uid": {"type": "keyword",},
+                    }
+                },
+                "internal": {
+                    "properties": {
+                        "cookie": {"type": "keyword",},
+                        "ip": {"type": "keyword",},
+                        "oauth_used": {"type": "keyword",},
+                    }
+                },
+                "request_id": {"type": "keyword",},
+            }
+        },
+        "notification": {
+            "properties": {
+                "date": {
+                    "type": "date",
+                    "store": True,
+                    "format": "yyyy/MM/dd HH:mm:ss",
+                },
+                "epoch": {"type": "long"},
+                "from": {"type": "text",},
+                "in-reply-to": {"type": "keyword",},
+                "list": {"type": "text",},
+                "message-id": {"type": "keyword",},
+                "mid": {"type": "text",},
+                "private": {"type": "boolean"},
+                "recipient": {"type": "keyword",},
+                "seen": {"type": "long"},
+                "subject": {"type": "keyword",},
+                "to": {"type": "text",},
+                "type": {"type": "keyword",},
+            }
+        },
+    }
+
+    for index, mappings in mappings.items():
+        res = es.indices.create(
+            index=f"{dbname}-{index}", body={"mappings": mappings, "settings": settings}
+        )
+
+        print(f"Index {dbname}-{index} created! %s " % res)
+
+
+# we need to connect to database to determine the engine version
+es = Elasticsearch(
+    [{"host": hostname, "port": port, "use_ssl": False, "url_prefix": urlPrefix}],
+    max_retries=5,
+    retry_on_timeout=True,
+)
+
+# elasticsearch logs lots of warnings on retries/connection failure
+logging.getLogger("elasticsearch").setLevel(logging.ERROR)
+
+try:
+    DB_VERSION = es.info()["version"]["number"]
+except ES_ConnectionError:
+    print("WARNING: Connection error: could not determine the engine version.")
+    DB_VERSION = "0.0.0"
+
+DB_MAJOR = int(DB_VERSION.split(".")[0])
+print(
+    "Versions: library %d (%s), engine %d (%s)"
+    % (ES_MAJOR, ".".join(map(str, ES_VERSION)), DB_MAJOR, DB_VERSION)
+)
+if DB_MAJOR < 7:
+    print("This version of Pony Mail requires ElasticSearch 7.x or higher")
+
+if not DB_MAJOR == ES_MAJOR:
+    print("WARNING: library version does not agree with engine version!")
+
+if DB_MAJOR == 0:  # not known
+    if args.noi:
+        # allow setup to be used without engine running
+        print(
+            "Could not determine the engine version. Assume it is the same as the library version."
+        )
+        DB_MAJOR = ES_MAJOR
+    else:
+        # if we cannot connect to get the version, we cannot create the index later
+        print("Could not connect to the engine. Fatal.")
+        sys.exit(1)
+
+if not args.noi:
+    try:
+        createIndex()
+    except ElasticsearchException as e:
+        print("Index creation failed: %s" % e)
+        sys.exit(1)
+
+ponymail_cfg = "ponymail.yaml"
+if not args.clobber and os.path.exists(ponymail_cfg):
+    print("%s exists and clobber is not set" % ponymail_cfg)
+    ponymail_cfg = "ponymail.yaml.tmp"
+
+print("Writing importer config (%s)" % ponymail_cfg)
+
+with open(ponymail_cfg, "w") as f:
+    f.write(
+        """
+---
+###############################################################
+# A ponymail.yaml is needed to run this project. This sample config file was
+# originally generated by tools/setup.py.
+# 
+# Run the tools/setup.py script and a ponymail.yaml which looks a lot like this 
+# one will be generated. If, for whatever reason, that script is not working 
+# for you, you may use this ponymail.cfg as a starting point.
+# 
+# Contributors should strive to keep this sample updated. One way to do this 
+# would be to run the tools/setup.py, rename the generated config to
+# ponymail.cfg.sample, and then pasting this message or a modified form of 
+# this message at the top.
+###############################################################
+
+###############################################################
+# Pony Mail Configuration file
+
+
+# Main ES configuration
+elasticsearch:
+    hostname:               %s
+    dbname:                 %s
+    port:                   %u
+    ssl:                    false
+    #uri:                   url_prefix
+    #user:                  username
+    #password:              password
+    #wait:                  active shard count
+    #backup:                database name
+
+archiver:
+    #generator:             medium|full|cluster|dkim|other (dkim recommended)
+    generator:              %s
+    nonce:                  %s
+
+debug:
+    #cropout:               string to crop from list-id
+
+            """
+        % (hostname, dbname, port, genname, nonce or "~")
+    )
+
+print("Copying sample JS config to config.js (if needed)...")
+if not os.path.exists("../site/js/config.js") and os.path.exists(
+    "../site/js/config.js.sample"
+):
+    shutil.copy("../site/js/config.js.sample", "../site/js/config.js")
+
+
+print("All done, Pony Mail should...work now :)")
+print(
+    "If you are using an external mail inbound server, \nmake sure to copy the contents of this tools directory to it"
+)

[incubator-ponymail-foal] 02/02: Switch to mypy testing

Posted by hu...@apache.org.

This is an automated email from the ASF dual-hosted git repository.

humbedooh pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/incubator-ponymail-foal.git

commit f71908c7582094afd30646dfea32777396e6436f
Author: Daniel Gruno <hu...@apache.org>
AuthorDate: Fri Aug 14 13:36:31 2020 +0200

    Switch to mypy testing
---
 .travis.yml | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/.travis.yml b/.travis.yml
index e3b00fb..577f423 100644
--- a/.travis.yml
+++ b/.travis.yml
@@ -19,8 +19,11 @@ notifications:
     recipients:
       - dev@ponymail.apache.org
 
+before_script:
+  - pip install mypy
+
 script:
-  - echo "nothing to do here yet"
+  - mypy --ignore-missing-imports archiver.py
 
 jobs:
   include: