You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@ponymail.apache.org by sebb <se...@gmail.com> on 2016/11/23 12:44:33 UTC

Encoding issues

At present, the archiver and importer use different approaches to
dealing with encoding issues.

Thus the database content may change if a live message has to be
re-imported from an mbox.
This should not happen.

Further, they both assume that the raw source of an e-mail can be
represented as a string in some encoding.

AFAICT, this is not the case. E-mails may contain multiple sections in
different charsets.

If an entire e-mail is parsed using the same encoding, then some parts
of the email may be mangled or lost.

If each section of an email is parsed using the correct encoding and
converted to (say) utf-8, then the charsets defined in the email will
no longer apply to the version in the database. I.e. if the source is
exported, re-importing will not work properly unless the charset
definitions in the mail are ignored.
A mail that is imported and exported may well differ.

If the mbox export function of PonyMail is to be of any use, it's
vital that the original mail contents can be recovered. I don't think
that is the case at present.