You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mime4j-dev@james.apache.org by "Thomas Fricker (Jira)" <mi...@james.apache.org> on 2022/05/18 14:17:00 UTC

[jira] [Updated] (MIME4J-316) Parts missing in case of a specific combination of boundaries

     [ https://issues.apache.org/jira/browse/MIME4J-316?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Thomas Fricker updated MIME4J-316:
----------------------------------
    Description: 
The problem can be reproduced by parsing a very specific email structure, where
an inner boundary matches the name of another outer boundary followed by a "-" character.

In the following example, the attached pdf file will be ignored by the parser. 

 
{code:java}
Content-Type: multipart/mixed;
    boundary="--boundary.1652331600846930886"
----boundary.1652331600846930886
Content-Type: multipart/alternative;
    boundary="--boundary.1652331600846930886-1"
----boundary.1652331600846930886-1
Content-Type: text/plain; charset=utf-8
sometext
----boundary.1652331600846930886-1
Content-Type: text/html; charset=utf-8
<html lang="en">
    <body>
    </body>
</html>
----boundary.1652331600846930886-1--
----boundary.1652331600846930886
Content-Type: application/pdf;
    name="test.pdf"
Content-Transfer-Encoding: base64
Content-Disposition: Attachment;
    filename="test.pdf"
JVBERi0xLj4Kc3RhcnR4cmVmCjUzNjEwCiUlRU9GCgshortened==
----boundary.1652331600846930886--
{code}
Dumping the EntityState during parsing produces 
{code:java}
State: T_START_MULTIPART
State: T_PREAMBLE
State: T_END_MULTIPART
State: T_END_BODYPART
State: T_START_BODYPART
State: T_START_HEADER
Header field detected: Content-Type: text/plain; charset=utf-8
State: T_END_HEADER
Body detected, contents = [LineReaderInputStreamAdaptor: [pos: 43][limit: 103]....]], header data = [mimeType=text/plain, mediaType=text, subType=plain, boundary=null, charset=utf-8]
State: T_END_BODYPART
State: T_START_BODYPART
State: T_START_HEADER
Header field detected: Content-Type: text/html; charset=utf-8
State: T_END_HEADER
Body detected, contents = [LineReaderInputStreamAdaptor: [pos: 42][limit: 313]], header data = [mimeType=text/html, mediaType=text, subType=html, boundary=null, charset=utf-8]
State: T_END_BODYPART
State: T_EPILOGUE
State: T_END_MULTIPART
State: T_END_MESSAGE {code}
The PDF attachment is missing. 
I proposed following fix : [https://github.com/apache/james-mime4j/pull/71] 
which produces following structure: 
{code:java}
State: T_START_MULTIPART
State: T_START_BODYPART
State: T_START_HEADER
Header field detected: Content-Type: multipart/alternative;
	boundary="--boundary.1652331600846930886-1"
State: T_END_HEADER
Multipart message detexted, header data = [mimeType=multipart/alternative, mediaType=multipart, subType=alternative, boundary=--boundary.1652331600846930886-1, charset=null]
State: T_START_MULTIPART
State: T_START_BODYPART
State: T_START_HEADER
Header field detected: Content-Type: text/plain; charset=utf-8
State: T_END_HEADER
Body detected, contents = [LineReaderInputStreamAdaptor: [pos: 43][limit: 103]], header data = [mimeType=text/plain, mediaType=text, subType=plain, boundary=null, charset=utf-8]
State: T_END_BODYPART
State: T_START_BODYPART
State: T_START_HEADER
Header field detected: Content-Type: text/html; charset=utf-8
State: T_END_HEADER
Body detected, contents = [LineReaderInputStreamAdaptor: [pos: 42][limit: 313]]], header data = [mimeType=text/html, mediaType=text, subType=html, boundary=null, charset=utf-8]
State: T_END_BODYPART
State: T_END_MULTIPART
State: T_END_BODYPART
State: T_START_BODYPART
State: T_START_HEADER
Header field detected: Content-Type: application/pdf;
	name="test.pdf"
Header field detected: Content-Transfer-Encoding: base64
Header field detected: Content-Disposition: Attachment;
	filename="test.pdf"
State: T_END_HEADER
Body detected, contents = [LineReaderInputStreamAdaptor: [pos: 189][limit: 235][JVBERi0xLj4Kc3RhcnR4cmVmCjUzNjEwCiUlRU9GCg==
]], header data = [mimeType=application/pdf, mediaType=application, subType=pdf, boundary=null, charset=null]
State: T_END_BODYPART
State: T_END_MULTIPART
State: T_END_MESSAGE {code}
I shortened the output of the body parts.

 

  was:
The problem can be reproduced by parsing a very specific email structure, where
an inner boundary matches the name of another outer boundary followed by a "-" character.

In the following example, the attached pdf file will be ignored by the parser. 

 
{code:java}

Content-Type: multipart/mixed;
    boundary="--boundary.1652331600846930886"
----boundary.1652331600846930886
Content-Type: multipart/alternative;
    boundary="--boundary.1652331600846930886-1"
----boundary.1652331600846930886-1
Content-Type: text/plain; charset=utf-8
sometext
----boundary.1652331600846930886-1
Content-Type: text/html; charset=utf-8
<html lang="en">
    <body>
    </body>
</html>
----boundary.1652331600846930886-1--
----boundary.1652331600846930886
Content-Type: application/pdf;
    name="test.pdf"
Content-Transfer-Encoding: base64
Content-Disposition: Attachment;
    filename="test.pdf"
JVBERi0xLj4Kc3RhcnR4cmVmCjUzNjEwCiUlRU9GCgshortened==
----boundary.1652331600846930886--
{code}
Dumping the EntityState during parsing produces 
{code:java}
State: T_START_MULTIPART
State: T_PREAMBLE
State: T_END_MULTIPART
State: T_END_BODYPART
State: T_START_BODYPART
State: T_START_HEADER
Header field detected: Content-Type: text/plain; charset=utf-8
State: T_END_HEADER
Body detected, contents = [LineReaderInputStreamAdaptor: [pos: 43][limit: 103]....]], header data = [mimeType=text/plain, mediaType=text, subType=plain, boundary=null, charset=utf-8]
State: T_END_BODYPART
State: T_START_BODYPART
State: T_START_HEADER
Header field detected: Content-Type: text/html; charset=utf-8
State: T_END_HEADER
Body detected, contents = [LineReaderInputStreamAdaptor: [pos: 42][limit: 313]], header data = [mimeType=text/html, mediaType=text, subType=html, boundary=null, charset=utf-8]
State: T_END_BODYPART
State: T_EPILOGUE
State: T_END_MULTIPART
State: T_END_MESSAGE {code}
The PDF attachment is missing. 
I proposed following fix : [https://github.com/apache/james-mime4j/pull/71] 
which produces following structure: 
{code:java}
State: T_START_MULTIPART
State: T_START_BODYPART
State: T_START_HEADER
Header field detected: Content-Type: multipart/alternative;
	boundary="--boundary.1652331600846930886-1"
State: T_END_HEADER
Multipart message detexted, header data = [mimeType=multipart/alternative, mediaType=multipart, subType=alternative, boundary=--boundary.1652331600846930886-1, charset=null]
State: T_START_MULTIPART
State: T_START_BODYPART
State: T_START_HEADER
Header field detected: Content-Type: text/plain; charset=utf-8
State: T_END_HEADER
Body detected, contents = [LineReaderInputStreamAdaptor: [pos: 43][limit: 103]], header data = [mimeType=text/plain, mediaType=text, subType=plain, boundary=null, charset=utf-8]
State: T_END_BODYPART
State: T_START_BODYPART
State: T_START_HEADER
Header field detected: Content-Type: text/html; charset=utf-8
State: T_END_HEADER
Body detected, contents = [LineReaderInputStreamAdaptor: [pos: 42][limit: 313]]], header data = [mimeType=text/html, mediaType=text, subType=html, boundary=null, charset=utf-8]
State: T_END_BODYPART
State: T_END_MULTIPART
State: T_END_BODYPART
State: T_START_BODYPART
State: T_START_HEADER
Header field detected: Content-Type: application/pdf;
	name="Daily_Stats-2022-05-12-0700.pdf"
Header field detected: Content-Transfer-Encoding: base64
Header field detected: Content-Disposition: Attachment;
	filename="Daily_Stats-2022-05-12-0700.pdf"
State: T_END_HEADER
Body detected, contents = [LineReaderInputStreamAdaptor: [pos: 189][limit: 235][JVBERi0xLj4Kc3RhcnR4cmVmCjUzNjEwCiUlRU9GCg==
]], header data = [mimeType=application/pdf, mediaType=application, subType=pdf, boundary=null, charset=null]
State: T_END_BODYPART
State: T_END_MULTIPART
State: T_END_MESSAGE {code}
I shortened the output of the body parts.

 


> Parts missing in case of a specific combination of boundaries 
> --------------------------------------------------------------
>
>                 Key: MIME4J-316
>                 URL: https://issues.apache.org/jira/browse/MIME4J-316
>             Project: James Mime4j
>          Issue Type: Bug
>          Components: parser (core)
>    Affects Versions: 0.7.2
>            Reporter: Thomas Fricker
>            Priority: Major
>
> The problem can be reproduced by parsing a very specific email structure, where
> an inner boundary matches the name of another outer boundary followed by a "-" character.
> In the following example, the attached pdf file will be ignored by the parser. 
>  
> {code:java}
> Content-Type: multipart/mixed;
>     boundary="--boundary.1652331600846930886"
> ----boundary.1652331600846930886
> Content-Type: multipart/alternative;
>     boundary="--boundary.1652331600846930886-1"
> ----boundary.1652331600846930886-1
> Content-Type: text/plain; charset=utf-8
> sometext
> ----boundary.1652331600846930886-1
> Content-Type: text/html; charset=utf-8
> <html lang="en">
>     <body>
>     </body>
> </html>
> ----boundary.1652331600846930886-1--
> ----boundary.1652331600846930886
> Content-Type: application/pdf;
>     name="test.pdf"
> Content-Transfer-Encoding: base64
> Content-Disposition: Attachment;
>     filename="test.pdf"
> JVBERi0xLj4Kc3RhcnR4cmVmCjUzNjEwCiUlRU9GCgshortened==
> ----boundary.1652331600846930886--
> {code}
> Dumping the EntityState during parsing produces 
> {code:java}
> State: T_START_MULTIPART
> State: T_PREAMBLE
> State: T_END_MULTIPART
> State: T_END_BODYPART
> State: T_START_BODYPART
> State: T_START_HEADER
> Header field detected: Content-Type: text/plain; charset=utf-8
> State: T_END_HEADER
> Body detected, contents = [LineReaderInputStreamAdaptor: [pos: 43][limit: 103]....]], header data = [mimeType=text/plain, mediaType=text, subType=plain, boundary=null, charset=utf-8]
> State: T_END_BODYPART
> State: T_START_BODYPART
> State: T_START_HEADER
> Header field detected: Content-Type: text/html; charset=utf-8
> State: T_END_HEADER
> Body detected, contents = [LineReaderInputStreamAdaptor: [pos: 42][limit: 313]], header data = [mimeType=text/html, mediaType=text, subType=html, boundary=null, charset=utf-8]
> State: T_END_BODYPART
> State: T_EPILOGUE
> State: T_END_MULTIPART
> State: T_END_MESSAGE {code}
> The PDF attachment is missing. 
> I proposed following fix : [https://github.com/apache/james-mime4j/pull/71] 
> which produces following structure: 
> {code:java}
> State: T_START_MULTIPART
> State: T_START_BODYPART
> State: T_START_HEADER
> Header field detected: Content-Type: multipart/alternative;
> 	boundary="--boundary.1652331600846930886-1"
> State: T_END_HEADER
> Multipart message detexted, header data = [mimeType=multipart/alternative, mediaType=multipart, subType=alternative, boundary=--boundary.1652331600846930886-1, charset=null]
> State: T_START_MULTIPART
> State: T_START_BODYPART
> State: T_START_HEADER
> Header field detected: Content-Type: text/plain; charset=utf-8
> State: T_END_HEADER
> Body detected, contents = [LineReaderInputStreamAdaptor: [pos: 43][limit: 103]], header data = [mimeType=text/plain, mediaType=text, subType=plain, boundary=null, charset=utf-8]
> State: T_END_BODYPART
> State: T_START_BODYPART
> State: T_START_HEADER
> Header field detected: Content-Type: text/html; charset=utf-8
> State: T_END_HEADER
> Body detected, contents = [LineReaderInputStreamAdaptor: [pos: 42][limit: 313]]], header data = [mimeType=text/html, mediaType=text, subType=html, boundary=null, charset=utf-8]
> State: T_END_BODYPART
> State: T_END_MULTIPART
> State: T_END_BODYPART
> State: T_START_BODYPART
> State: T_START_HEADER
> Header field detected: Content-Type: application/pdf;
> 	name="test.pdf"
> Header field detected: Content-Transfer-Encoding: base64
> Header field detected: Content-Disposition: Attachment;
> 	filename="test.pdf"
> State: T_END_HEADER
> Body detected, contents = [LineReaderInputStreamAdaptor: [pos: 189][limit: 235][JVBERi0xLj4Kc3RhcnR4cmVmCjUzNjEwCiUlRU9GCg==
> ]], header data = [mimeType=application/pdf, mediaType=application, subType=pdf, boundary=null, charset=null]
> State: T_END_BODYPART
> State: T_END_MULTIPART
> State: T_END_MESSAGE {code}
> I shortened the output of the body parts.
>  



--
This message was sent by Atlassian Jira
(v8.20.7#820007)