You are viewing a plain text version of this content. The canonical link for it is here.
Posted to server-dev@james.apache.org by "Stefano Bagnara (JIRA)" <se...@james.apache.org> on 2008/07/18 13:22:33 UTC

[jira] Created: (MIME4J-58) Lenient dealing with headless messages or malformed header/body separation

Lenient dealing with headless messages or malformed header/body separation
--------------------------------------------------------------------------

                 Key: MIME4J-58
                 URL: https://issues.apache.org/jira/browse/MIME4J-58
             Project: Mime4j
          Issue Type: Task
    Affects Versions: 0.3
            Reporter: Stefano Bagnara
             Fix For: 0.5


Define how to deal with non canonical messages like this one:
-----------------------
This is a simple message not having headers.
The whole text should be recognized as body.
-----------------------
or this one:
-----------------------
Subject: this is a subject
This is an invalid header
AnotherHeader: is this an header or the first part of the body?

Body text
-----------------------

In the first case mime4j output twice an  "invalid header" error and a roundtrip write result in an empty message.
In the SMTP case this is unfortunate because sometimes it happens messages are sent without header.

In the second case mime4j currenlty take Subject and AnotherHeader as headers and "This is an invalid header" raise a monitor for "invalid header" and "Body text" is considered the body.

A compromise we evaluated in past between compliance, leniency and performace was to "alter" the requirement for CRLFCRLF between headers and body with a different rule: if during parsing of the headers we find a line (not multiline) and not including an "HeaderName: something" then we virtually add a CRLF *before* that line and consider that line the first line of the body. This allow us to only buffer a single line (as opposite to parsing the whole message in search of a CRLFCRLF and consider the full message a body if no CRLFCRLF is found) and to be very lenient with input. The "side effect" (maybe not bad) is that a wrong header in the middle of headers will result in some headers moved to the body.

With this algorythm the above would be "virtually" parsed as it was:
-----------------------

This is a simple message not having headers.
The whole text should be recognized as body.
-----------------------
or this one:
-----------------------
Subject: this is a subject

This is an invalid header
AnotherHeader: is this an header or the first part of the body?

Body text
-----------------------

If we think about strict and lenient approaches I think that current mime4j result is ok when using a strict parsing, while the one I propose is a good lenient alternative.

Opinions? Alternatives?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: server-dev-unsubscribe@james.apache.org
For additional commands, e-mail: server-dev-help@james.apache.org


[jira] Commented: (MIME4J-58) Lenient dealing with headless messages or malformed header/body separation

Posted by "Robert Burrell Donkin (JIRA)" <se...@james.apache.org>.
    [ https://issues.apache.org/jira/browse/MIME4J-58?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12615803#action_12615803 ] 

Robert Burrell Donkin commented on MIME4J-58:
---------------------------------------------

Not sure there's a perfect solution for messages of this type but sounds like a reasonable approach to me

I think should be able to implement by adding the behaviour to the approach monitor method. This would allow subclasses to change this easily if desired.

Robert

> Lenient dealing with headless messages or malformed header/body separation
> --------------------------------------------------------------------------
>
>                 Key: MIME4J-58
>                 URL: https://issues.apache.org/jira/browse/MIME4J-58
>             Project: Mime4j
>          Issue Type: Task
>    Affects Versions: 0.3
>            Reporter: Stefano Bagnara
>             Fix For: 0.5
>
>         Attachments: headerbody-nocrlfcrlf.msg, headerbody-noheader.msg
>
>
> Define how to deal with non canonical messages like this one:
> -----------------------
> This is a simple message not having headers.
> The whole text should be recognized as body.
> -----------------------
> or this one:
> -----------------------
> Subject: this is a subject
> This is an invalid header
> AnotherHeader: is this an header or the first part of the body?
> Body text
> -----------------------
> In the first case mime4j output twice an  "invalid header" error and a roundtrip write result in an empty message.
> In the SMTP case this is unfortunate because sometimes it happens messages are sent without header.
> In the second case mime4j currenlty take Subject and AnotherHeader as headers and "This is an invalid header" raise a monitor for "invalid header" and "Body text" is considered the body.
> A compromise we evaluated in past between compliance, leniency and performace was to "alter" the requirement for CRLFCRLF between headers and body with a different rule: if during parsing of the headers we find a line (not multiline) and not including an "HeaderName: something" then we virtually add a CRLF *before* that line and consider that line the first line of the body. This allow us to only buffer a single line (as opposite to parsing the whole message in search of a CRLFCRLF and consider the full message a body if no CRLFCRLF is found) and to be very lenient with input. The "side effect" (maybe not bad) is that a wrong header in the middle of headers will result in some headers moved to the body.
> With this algorythm the above would be "virtually" parsed as it was:
> -----------------------
> This is a simple message not having headers.
> The whole text should be recognized as body.
> -----------------------
> or this one:
> -----------------------
> Subject: this is a subject
> This is an invalid header
> AnotherHeader: is this an header or the first part of the body?
> Body text
> -----------------------
> If we think about strict and lenient approaches I think that current mime4j result is ok when using a strict parsing, while the one I propose is a good lenient alternative.
> Opinions? Alternatives?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: server-dev-unsubscribe@james.apache.org
For additional commands, e-mail: server-dev-help@james.apache.org


[jira] Commented: (MIME4J-58) Lenient dealing with headless messages or malformed header/body separation

Posted by "Robert Burrell Donkin (JIRA)" <se...@james.apache.org>.
    [ https://issues.apache.org/jira/browse/MIME4J-58?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12617177#action_12617177 ] 

Robert Burrell Donkin commented on MIME4J-58:
---------------------------------------------

I took a look but don't have an elegant way to solve pushing back as yet. If your refactoring includes proposals for the pull parser then I suggest we wait until they're ready.

> Lenient dealing with headless messages or malformed header/body separation
> --------------------------------------------------------------------------
>
>                 Key: MIME4J-58
>                 URL: https://issues.apache.org/jira/browse/MIME4J-58
>             Project: Mime4j
>          Issue Type: Task
>    Affects Versions: 0.3
>            Reporter: Stefano Bagnara
>             Fix For: 0.5
>
>         Attachments: headerbody-nocrlfcrlf.msg, headerbody-noheader.msg
>
>
> Define how to deal with non canonical messages like this one:
> -----------------------
> This is a simple message not having headers.
> The whole text should be recognized as body.
> -----------------------
> or this one:
> -----------------------
> Subject: this is a subject
> This is an invalid header
> AnotherHeader: is this an header or the first part of the body?
> Body text
> -----------------------
> In the first case mime4j output twice an  "invalid header" error and a roundtrip write result in an empty message.
> In the SMTP case this is unfortunate because sometimes it happens messages are sent without header.
> In the second case mime4j currenlty take Subject and AnotherHeader as headers and "This is an invalid header" raise a monitor for "invalid header" and "Body text" is considered the body.
> A compromise we evaluated in past between compliance, leniency and performace was to "alter" the requirement for CRLFCRLF between headers and body with a different rule: if during parsing of the headers we find a line (not multiline) and not including an "HeaderName: something" then we virtually add a CRLF *before* that line and consider that line the first line of the body. This allow us to only buffer a single line (as opposite to parsing the whole message in search of a CRLFCRLF and consider the full message a body if no CRLFCRLF is found) and to be very lenient with input. The "side effect" (maybe not bad) is that a wrong header in the middle of headers will result in some headers moved to the body.
> With this algorythm the above would be "virtually" parsed as it was:
> -----------------------
> This is a simple message not having headers.
> The whole text should be recognized as body.
> -----------------------
> or this one:
> -----------------------
> Subject: this is a subject
> This is an invalid header
> AnotherHeader: is this an header or the first part of the body?
> Body text
> -----------------------
> If we think about strict and lenient approaches I think that current mime4j result is ok when using a strict parsing, while the one I propose is a good lenient alternative.
> Opinions? Alternatives?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: server-dev-unsubscribe@james.apache.org
For additional commands, e-mail: server-dev-help@james.apache.org


[jira] Commented: (MIME4J-58) Lenient dealing with headless messages or malformed header/body separation

Posted by "Stefano Bagnara (JIRA)" <se...@james.apache.org>.
    [ https://issues.apache.org/jira/browse/MIME4J-58?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12616181#action_12616181 ] 

Stefano Bagnara commented on MIME4J-58:
---------------------------------------

Using the monitor for this sounds like a good plan, but I'm not sure how to easily push back the bad header to the body stream. I hope you will give this a go and show us some code.

> Lenient dealing with headless messages or malformed header/body separation
> --------------------------------------------------------------------------
>
>                 Key: MIME4J-58
>                 URL: https://issues.apache.org/jira/browse/MIME4J-58
>             Project: Mime4j
>          Issue Type: Task
>    Affects Versions: 0.3
>            Reporter: Stefano Bagnara
>             Fix For: 0.5
>
>         Attachments: headerbody-nocrlfcrlf.msg, headerbody-noheader.msg
>
>
> Define how to deal with non canonical messages like this one:
> -----------------------
> This is a simple message not having headers.
> The whole text should be recognized as body.
> -----------------------
> or this one:
> -----------------------
> Subject: this is a subject
> This is an invalid header
> AnotherHeader: is this an header or the first part of the body?
> Body text
> -----------------------
> In the first case mime4j output twice an  "invalid header" error and a roundtrip write result in an empty message.
> In the SMTP case this is unfortunate because sometimes it happens messages are sent without header.
> In the second case mime4j currenlty take Subject and AnotherHeader as headers and "This is an invalid header" raise a monitor for "invalid header" and "Body text" is considered the body.
> A compromise we evaluated in past between compliance, leniency and performace was to "alter" the requirement for CRLFCRLF between headers and body with a different rule: if during parsing of the headers we find a line (not multiline) and not including an "HeaderName: something" then we virtually add a CRLF *before* that line and consider that line the first line of the body. This allow us to only buffer a single line (as opposite to parsing the whole message in search of a CRLFCRLF and consider the full message a body if no CRLFCRLF is found) and to be very lenient with input. The "side effect" (maybe not bad) is that a wrong header in the middle of headers will result in some headers moved to the body.
> With this algorythm the above would be "virtually" parsed as it was:
> -----------------------
> This is a simple message not having headers.
> The whole text should be recognized as body.
> -----------------------
> or this one:
> -----------------------
> Subject: this is a subject
> This is an invalid header
> AnotherHeader: is this an header or the first part of the body?
> Body text
> -----------------------
> If we think about strict and lenient approaches I think that current mime4j result is ok when using a strict parsing, while the one I propose is a good lenient alternative.
> Opinions? Alternatives?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: server-dev-unsubscribe@james.apache.org
For additional commands, e-mail: server-dev-help@james.apache.org


[jira] Updated: (MIME4J-58) Lenient dealing with headless messages or malformed header/body separation

Posted by "Stefano Bagnara (JIRA)" <se...@james.apache.org>.
     [ https://issues.apache.org/jira/browse/MIME4J-58?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Stefano Bagnara updated MIME4J-58:
----------------------------------

    Attachment: headerbody-noheader.msg
                headerbody-nocrlfcrlf.msg

the 2 test messages I refferred in the issue.
We should define expected xml for that input.

> Lenient dealing with headless messages or malformed header/body separation
> --------------------------------------------------------------------------
>
>                 Key: MIME4J-58
>                 URL: https://issues.apache.org/jira/browse/MIME4J-58
>             Project: Mime4j
>          Issue Type: Task
>    Affects Versions: 0.3
>            Reporter: Stefano Bagnara
>             Fix For: 0.5
>
>         Attachments: headerbody-nocrlfcrlf.msg, headerbody-noheader.msg
>
>
> Define how to deal with non canonical messages like this one:
> -----------------------
> This is a simple message not having headers.
> The whole text should be recognized as body.
> -----------------------
> or this one:
> -----------------------
> Subject: this is a subject
> This is an invalid header
> AnotherHeader: is this an header or the first part of the body?
> Body text
> -----------------------
> In the first case mime4j output twice an  "invalid header" error and a roundtrip write result in an empty message.
> In the SMTP case this is unfortunate because sometimes it happens messages are sent without header.
> In the second case mime4j currenlty take Subject and AnotherHeader as headers and "This is an invalid header" raise a monitor for "invalid header" and "Body text" is considered the body.
> A compromise we evaluated in past between compliance, leniency and performace was to "alter" the requirement for CRLFCRLF between headers and body with a different rule: if during parsing of the headers we find a line (not multiline) and not including an "HeaderName: something" then we virtually add a CRLF *before* that line and consider that line the first line of the body. This allow us to only buffer a single line (as opposite to parsing the whole message in search of a CRLFCRLF and consider the full message a body if no CRLFCRLF is found) and to be very lenient with input. The "side effect" (maybe not bad) is that a wrong header in the middle of headers will result in some headers moved to the body.
> With this algorythm the above would be "virtually" parsed as it was:
> -----------------------
> This is a simple message not having headers.
> The whole text should be recognized as body.
> -----------------------
> or this one:
> -----------------------
> Subject: this is a subject
> This is an invalid header
> AnotherHeader: is this an header or the first part of the body?
> Body text
> -----------------------
> If we think about strict and lenient approaches I think that current mime4j result is ok when using a strict parsing, while the one I propose is a good lenient alternative.
> Opinions? Alternatives?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: server-dev-unsubscribe@james.apache.org
For additional commands, e-mail: server-dev-help@james.apache.org