You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@ponymail.apache.org by GitBox <gi...@apache.org> on 2020/09/03 22:25:32 UTC

[GitHub] [incubator-ponymail-foal] sebbASF opened a new issue #5: Bug: code should not fail to handle html-only mails when html2text is absent

sebbASF opened a new issue #5:
URL: https://github.com/apache/incubator-ponymail-foal/issues/5


   The code should be able to parse and store all emails even if they are HTML-only and html2text is not available.
   
   Also the generated permalink should not depend on whether the --html option was used.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-ponymail-foal] Humbedooh commented on issue #5: Bug: code should not fail to handle html-only mails when html2text is absent

Posted by GitBox <gi...@apache.org>.
Humbedooh commented on issue #5:
URL: https://github.com/apache/incubator-ponymail-foal/issues/5#issuecomment-687599931


   To make the unit tests align with old pony, we'll need to both detect this "html source only" status, and inject it into the resulting json:
   
   ~~~diff
   @@ -415,13 +420,16 @@ class Archiver(object):  # N.B. Also used by import-mbox.py
            if body is not None or attachments:
                pmid = mid
                id_set = set()  # Use a set to avoid duplicates
   +            which_body = body if body and body.character_set else body and body.bytes or ""
   +            if body.html_as_source:
   +                which_body = ""
                for generator in self.generator.split(" "):
                    if generator:
                        try:
                            mid = plugins.generators.generate(
                                generator,
                                msg,
   -                            body if body and body.character_set else body and body.bytes or "",
   +                            which_body,
                                lid,
                                attachments,
                                raw_msg,
   @@ -469,6 +477,7 @@ class Archiver(object):  # N.B. Also used by import-mbox.py
                    "references": msg_metadata["references"],
                    "in-reply-to": irt,
                    "body": body.unflow() if body else "",
   +                "html_source_only": body.html_as_source,
                    "attachments": attachments,
                }
   ~~~
   
   Then in the `test-parsing.py`, we need this:
   
   ~~~diff
                    json = archie.compute_updates(fake_args, lid, False, message, message_raw)
                    body_sha3_256 = None
                    if json and json.get('body') is not None:
   -                    body_sha3_256 = hashlib.sha3_256(json['body'].encode('utf-8')).hexdigest()
   +                    if not json.get('html_source_only'):
   +                        body_sha3_256 = hashlib.sha3_256(json['body'].encode('utf-8')).hexdigest()
                    if body_sha3_256 != test['body_sha3_256']:
                        errors += 1
                        sys.stderr.write("""[FAIL] parsing index %2u: Expected: %s Got: %s\n""" %
   ~~~
   
   It puts some limitations on the unit tests - perhaps we need to expand the yaml settings to allow for differences between the two versions.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-ponymail-foal] Humbedooh commented on issue #5: Bug: code should not fail to handle html-only mails when html2text is absent

Posted by GitBox <gi...@apache.org>.
Humbedooh commented on issue #5:
URL: https://github.com/apache/incubator-ponymail-foal/issues/5#issuecomment-687595589


   Whether the permalink ID depends on it or not will depend on the generator.
   For modern ones (full, dkim) it won't matter as they use full bytes. For older ones, we might have to stick with it being dependent on --html, so as to not break older installations.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-ponymail-foal] Humbedooh commented on issue #5: Bug: code should not fail to handle html-only mails when html2text is absent

Posted by GitBox <gi...@apache.org>.
Humbedooh commented on issue #5:
URL: https://github.com/apache/incubator-ponymail-foal/issues/5#issuecomment-687749034


   I have implemented the above in the code, and it works as expected.
   Foal now stores the html source as-is when there is no html-to-text generator specified. It adds a `html_as_source` field for letting the unit tests for parsing pass, but that's something we should probably make more configurable in the unit test yaml.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-ponymail-foal] Humbedooh closed issue #5: Bug: code should not fail to handle html-only mails when html2text is absent

Posted by GitBox <gi...@apache.org>.
Humbedooh closed issue #5:
URL: https://github.com/apache/incubator-ponymail-foal/issues/5


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-ponymail-foal] Humbedooh commented on issue #5: Bug: code should not fail to handle html-only mails when html2text is absent

Posted by GitBox <gi...@apache.org>.
Humbedooh commented on issue #5:
URL: https://github.com/apache/incubator-ponymail-foal/issues/5#issuecomment-687595402


   First bit we can easily do. It will break three unit tests, but we can fix that:
   ~~~diff
   @@ -312,8 +312,7 @@ class Archiver(object):  # N.B. Also used by import-mbox.py
                    ]:
                        body = Body(part)
                    elif (
   -                    self.html
   -                    and not first_html
   +                    not first_html
                        and part.get_content_type() == "text/html"
                    ):
                        first_html = Body(part)
   @@ -327,7 +326,9 @@ class Archiver(object):  # N.B. Also used by import-mbox.py
                or (self.ignore_body and str(body).find(str(self.ignore_body)) != -1)
            ):
                body = first_html
   -            body.assign(self.html2text(str(body)))
   +            # Convert HTML to text if mod is installed and enabled, otherwise keep the source as-is
   +            if self.html:
   +                body.assign(self.html2text(str(body)))
            return body
    
        # N.B. this is also called by import-mbox.py
   ~~~


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org