You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@ponymail.apache.org by se...@apache.org on 2019/09/06 17:20:44 UTC

[incubator-ponymail] branch master updated: Document stability issue

This is an automated email from the ASF dual-hosted git repository.

sebb pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/incubator-ponymail.git


The following commit(s) were added to refs/heads/master by this push:
     new 7ef7dc7  Document stability issue
7ef7dc7 is described below

commit 7ef7dc7f675ab4f44fcd620f9655c9881c0fee04
Author: Sebb <se...@apache.org>
AuthorDate: Fri Sep 6 18:20:40 2019 +0100

    Document stability issue
---
 tools/generators.py | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/tools/generators.py b/tools/generators.py
index 9a84693..528ca16 100644
--- a/tools/generators.py
+++ b/tools/generators.py
@@ -52,6 +52,10 @@ def medium(msg, body, lid, _attachments):
     (does not generate sufficiently unique ids)
     Also the lid is included in the hash; this causes problems if the listname needs to be changed.
 
+    N.B. The id is not guaranteed stable - i.e. it may change if the message is reparsed. 
+    The id depends on the parsed body, which depends on the exact method used to parse the mail.
+    For example, are invalid characters ignored or replaced; is html parsing used?
+
     The following message fields are concatenated to form the hash input:
     - body: if bytes as is else encoded ascii, ignoring invalid characters; if the body is null an Exception is thrown
     - lid
@@ -141,6 +145,10 @@ def cluster(msg, body, lid, attachments):
     For mails with a valid Message-ID this is likely to be unique
     In other cases it is better than the medium generator as it uses several extra fields
 
+    N.B. The id is not guaranteed stable - i.e. it may change if the message is reparsed. 
+    The id depends on the parsed body, which depends on the exact method used to parse the mail.
+    For example, are invalid characters ignored or replaced; is html parsing used?
+
     The following message fields are concatenated to form the hash input:
     - body as is if bytes else encoded ascii, ignoring invalid characters; if the body is null it is treated as an empty string
       (currently trailing whitespace is dropped)