You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ponymail.apache.org by GitBox <gi...@apache.org> on 2020/08/17 22:45:46 UTC

[GitHub] [incubator-ponymail] sebbASF opened a new issue #519: Bug: email parser mishandles old-style boundaries

sebbASF opened a new issue #519:
URL: https://github.com/apache/incubator-ponymail/issues/519


   The code that parses boundary strings strips <>. This breaks parsing of some messages, for example the unit test corpus file tomcat-ancient-boundary.mbox which has the following boundary:
   
   Content-Type: multipart/mixed; boundary="<<001-3e1dcd5a-119e>>"
   
   Once parsed, the boundary becomes "<001-3e1dcd5a-119e>" which does not match.
   
   There are two bugs for this:
   https://bugs.python.org/issue28945
   https://bugs.python.org/issue29020
   but unfortunately no fix in sight.
   
   It's possible to monkey-patch the library by providing a replacement copy of the method email.utils.collapse_rfc2231_value.
   
   It might make sense to add this as an option (at least initially) for the importer so that missing messages could be imported.
   
   Attached is some test code to demonstrate the fix.
   
   [parse_email.py.zip](https://github.com/apache/incubator-ponymail/files/5087155/parse_email.py.zip)
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org