You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ponymail.apache.org by GitBox <gi...@apache.org> on 2020/03/16 10:30:30 UTC

[GitHub] [incubator-ponymail] sebbASF opened a new issue #513: Bug: cluster generator fails to parse non-ascii subject and sender

sebbASF opened a new issue #513: Bug: cluster generator fails to parse non-ascii subject and sender
URL: https://github.com/apache/incubator-ponymail/issues/513
 
 
   The cluster generator uses fields of the msg assuming that they will be strings.
   
   However that is not the case if non-ascii characters have been used.
   
   In such cases, code such as msg.get('subject') will return an email.header.Header object.
   This causes code such as bytes(subject, encoding = 'ascii') to fail with
   
   TypeError: encoding without a string argument
   
   In turn, this causes the archiver to revert to a very basic fallback mid:
   
           mid = hashlib.sha224(str("%s-%s" % (lid, msg_metadata['archived-at'])).encode('utf-8')).hexdigest() + "@" + (lid if lid else "none")
   
   **Unless archived-at is defined, this will be constant for a given list id**
   
   This is relatively easy to fix; the generator should use the msg_metadata dict which
   the archiver has already set up.
   
   HOWEVER, to ensure that it's possible to regenerate the same Permalinks, any fix MUST be implemented as a new generator type, with a new syntax (i.e. change the 'r' prefix).
   
   There are probably some other changes that need to be made to the cluster generator.
   For example, Message-Id should be canonicalised.
   
   Note that the fallback mid cannot be changed, as that would affect all the generators.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-ponymail] Humbedooh commented on issue #513: Bug: cluster generator fails to parse non-ascii subject and sender

Posted by GitBox <gi...@apache.org>.
Humbedooh commented on issue #513: Bug: cluster generator fails to parse non-ascii subject and sender
URL: https://github.com/apache/incubator-ponymail/issues/513#issuecomment-599580254
 
 
   Let's bump the 'r' prefix then. I'd suggest moving to next char, 'v' :)
   So, am I understanding this right in that the generators should be passed msg_metadata instead of msg in the first variable slot, and that would address the main issue? If so, we'd probably be best off by accessing them with `msg.get('subject', '')` as they may not have been present in the email and this would avoid a key error.
   
   What do you mean by message-id being canonicalized?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-ponymail] sebbASF commented on issue #513: Bug: cluster generator fails to parse non-ascii subject and sender

Posted by GitBox <gi...@apache.org>.
sebbASF commented on issue #513: Bug: cluster generator fails to parse non-ascii subject and sender
URL: https://github.com/apache/incubator-ponymail/issues/513#issuecomment-599649246
 
 
   No, it's not possible to replace msg by msg_metadata, because that will change the output from the existing generators (at least with some input).
   
   I was thinking of adding  the metadata as another parameter.
   It would only be used by the new generator(s). 
   This would preserve the existing behaviour.
   
   As to canonicalisation, headers may be wrapped in transit.
   For maximum portability, they should be unwrapped before use.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services