You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mime4j-dev@james.apache.org by Thomas Ehardt <th...@ehardt.net> on 2020/06/01 16:45:34 UTC

Using Mime4J to (only) repair bad headers

I typically use JavaMail to parse eml files, but it is not terribly
forgiving. I've looked at using Mime4J in some situations, most notably
when there are invalid headers, and its leniency is great!

For whatever reason, we sometimes get messages where date fields do not
have quotes around them. For example:

Content-Type: text/plain; name="attachment.txt"
Content-Transfer-Encoding: base64
Content-Disposition: attachment;
                filename="attachment.txt";
                size=64;
                creation-date=Sat, 30 Apr 2005 19:28:29 -0300;
                modification-date=Sat, 30 Apr 2005 19:28:29 -0300

JavaMail cannot parse these, but Mime4J can, and the DOM APIs, it will
easily re-write these headers to be compliant. However, the DOM APIs
sometimes modify other parts of the source message (seems to be related to
parts being labeled "quoted-printable" but not being so), so I've started
looking at the streaming components.

Ideally, I would like to leave the original message as-is, even if it is
otherwise not correct, except for these headers (either rewriting all
headers or just Content-Disposition which appears to be the only place this
issue occurs).

Does anyone have an example of how to do such a modification (or something
close enough, such as using a stream parser to make a copy of the original
message)?

Thanks in advance!

Re: Using Mime4J to (only) repair bad headers

Posted by Tellier Benoit <bt...@apache.org>.
Hi Thomas,

I think having a "no modification message writer" as part directly of
MIME4J could be a great plus.

Do you think you can contribute it, if relevant?

Cheers,

Benoit

On 02/06/2020 07:02, Thomas Ehardt wrote:
> It turns out, my original code (using the DOM APIs) was fine; the culprit
> was the MessageWriter performing encoding where I don't want it to:
> https://github.com/apache/james-mime4j/blob/d7643b9434dfd7897c41fb2d69d28db1bf13ef2f/dom/src/main/java/org/apache/james/mime4j/message/DefaultMessageWriter.java#L234-L235
>
> I created a custom MessageWriter that doesn't perform this conversion, and
> I have the exact solution I was looking for!
>
> On Mon, Jun 1, 2020 at 3:10 PM Eugen Stan <st...@gmail.com> wrote:
>
>> Hi Thomas,
>>
>> I'm not familiar with this code but have you tried checking the examples
>> ?  Also MimeTokenStream in core package and the JavaDocs and the tests.
>>
>>
>> https://github.com/apache/james-mime4j/blob/master/examples/src/main/java/org/apache/james/mime4j/samples/transform/TransformMessage.java
>>
>>
>> https://github.com/apache/james-mime4j/blob/master/core/src/main/java/org/apache/james/mime4j/stream/MimeTokenStream.java
>>
>>
>> https://github.com/apache/james-mime4j/blob/master/core/src/test/java/org/apache/james/mime4j/stream/MimeTokenStreamTest.java
>>
>>
>> https://github.com/apache/james-mime4j/blob/master/core/src/test/java/org/apache/james/mime4j/stream/MimeTokenStreamReaderTest.java
>>
>>
>> https://github.com/apache/james-mime4j/blob/master/core/src/test/java/org/apache/james/mime4j/stream/StrictMimeTokenStreamTest.java
>>
>>
>>
>> La 01.06.2020 19:45, Thomas Ehardt a scris:
>>> I typically use JavaMail to parse eml files, but it is not terribly
>>> forgiving. I've looked at using Mime4J in some situations, most notably
>>> when there are invalid headers, and its leniency is great!
>>>
>>> For whatever reason, we sometimes get messages where date fields do not
>>> have quotes around them. For example:
>>>
>>> Content-Type: text/plain; name="attachment.txt"
>>> Content-Transfer-Encoding: base64
>>> Content-Disposition: attachment;
>>>                 filename="attachment.txt";
>>>                 size=64;
>>>                 creation-date=Sat, 30 Apr 2005 19:28:29 -0300;
>>>                 modification-date=Sat, 30 Apr 2005 19:28:29 -0300
>>>
>>> JavaMail cannot parse these, but Mime4J can, and the DOM APIs, it will
>>> easily re-write these headers to be compliant. However, the DOM APIs
>>> sometimes modify other parts of the source message (seems to be related
>> to
>>> parts being labeled "quoted-printable" but not being so), so I've started
>>> looking at the streaming components.
>>>
>>> Ideally, I would like to leave the original message as-is, even if it is
>>> otherwise not correct, except for these headers (either rewriting all
>>> headers or just Content-Disposition which appears to be the only place
>> this
>>> issue occurs).
>>>
>>> Does anyone have an example of how to do such a modification (or
>> something
>>> close enough, such as using a stream parser to make a copy of the
>> original
>>> message)?
>>>
>>> Thanks in advance!
>>>
>>

Re: Using Mime4J to (only) repair bad headers

Posted by Thomas Ehardt <th...@ehardt.net>.
It turns out, my original code (using the DOM APIs) was fine; the culprit
was the MessageWriter performing encoding where I don't want it to:
https://github.com/apache/james-mime4j/blob/d7643b9434dfd7897c41fb2d69d28db1bf13ef2f/dom/src/main/java/org/apache/james/mime4j/message/DefaultMessageWriter.java#L234-L235

I created a custom MessageWriter that doesn't perform this conversion, and
I have the exact solution I was looking for!

On Mon, Jun 1, 2020 at 3:10 PM Eugen Stan <st...@gmail.com> wrote:

> Hi Thomas,
>
> I'm not familiar with this code but have you tried checking the examples
> ?  Also MimeTokenStream in core package and the JavaDocs and the tests.
>
>
> https://github.com/apache/james-mime4j/blob/master/examples/src/main/java/org/apache/james/mime4j/samples/transform/TransformMessage.java
>
>
> https://github.com/apache/james-mime4j/blob/master/core/src/main/java/org/apache/james/mime4j/stream/MimeTokenStream.java
>
>
> https://github.com/apache/james-mime4j/blob/master/core/src/test/java/org/apache/james/mime4j/stream/MimeTokenStreamTest.java
>
>
> https://github.com/apache/james-mime4j/blob/master/core/src/test/java/org/apache/james/mime4j/stream/MimeTokenStreamReaderTest.java
>
>
> https://github.com/apache/james-mime4j/blob/master/core/src/test/java/org/apache/james/mime4j/stream/StrictMimeTokenStreamTest.java
>
>
>
> La 01.06.2020 19:45, Thomas Ehardt a scris:
> > I typically use JavaMail to parse eml files, but it is not terribly
> > forgiving. I've looked at using Mime4J in some situations, most notably
> > when there are invalid headers, and its leniency is great!
> >
> > For whatever reason, we sometimes get messages where date fields do not
> > have quotes around them. For example:
> >
> > Content-Type: text/plain; name="attachment.txt"
> > Content-Transfer-Encoding: base64
> > Content-Disposition: attachment;
> >                 filename="attachment.txt";
> >                 size=64;
> >                 creation-date=Sat, 30 Apr 2005 19:28:29 -0300;
> >                 modification-date=Sat, 30 Apr 2005 19:28:29 -0300
> >
> > JavaMail cannot parse these, but Mime4J can, and the DOM APIs, it will
> > easily re-write these headers to be compliant. However, the DOM APIs
> > sometimes modify other parts of the source message (seems to be related
> to
> > parts being labeled "quoted-printable" but not being so), so I've started
> > looking at the streaming components.
> >
> > Ideally, I would like to leave the original message as-is, even if it is
> > otherwise not correct, except for these headers (either rewriting all
> > headers or just Content-Disposition which appears to be the only place
> this
> > issue occurs).
> >
> > Does anyone have an example of how to do such a modification (or
> something
> > close enough, such as using a stream parser to make a copy of the
> original
> > message)?
> >
> > Thanks in advance!
> >
>
>

Re: Using Mime4J to (only) repair bad headers

Posted by Eugen Stan <st...@gmail.com>.
Hi Thomas,

I'm not familiar with this code but have you tried checking the examples
?  Also MimeTokenStream in core package and the JavaDocs and the tests.

https://github.com/apache/james-mime4j/blob/master/examples/src/main/java/org/apache/james/mime4j/samples/transform/TransformMessage.java

https://github.com/apache/james-mime4j/blob/master/core/src/main/java/org/apache/james/mime4j/stream/MimeTokenStream.java

https://github.com/apache/james-mime4j/blob/master/core/src/test/java/org/apache/james/mime4j/stream/MimeTokenStreamTest.java

https://github.com/apache/james-mime4j/blob/master/core/src/test/java/org/apache/james/mime4j/stream/MimeTokenStreamReaderTest.java

https://github.com/apache/james-mime4j/blob/master/core/src/test/java/org/apache/james/mime4j/stream/StrictMimeTokenStreamTest.java



La 01.06.2020 19:45, Thomas Ehardt a scris:
> I typically use JavaMail to parse eml files, but it is not terribly
> forgiving. I've looked at using Mime4J in some situations, most notably
> when there are invalid headers, and its leniency is great!
>
> For whatever reason, we sometimes get messages where date fields do not
> have quotes around them. For example:
>
> Content-Type: text/plain; name="attachment.txt"
> Content-Transfer-Encoding: base64
> Content-Disposition: attachment;
>                 filename="attachment.txt";
>                 size=64;
>                 creation-date=Sat, 30 Apr 2005 19:28:29 -0300;
>                 modification-date=Sat, 30 Apr 2005 19:28:29 -0300
>
> JavaMail cannot parse these, but Mime4J can, and the DOM APIs, it will
> easily re-write these headers to be compliant. However, the DOM APIs
> sometimes modify other parts of the source message (seems to be related to
> parts being labeled "quoted-printable" but not being so), so I've started
> looking at the streaming components.
>
> Ideally, I would like to leave the original message as-is, even if it is
> otherwise not correct, except for these headers (either rewriting all
> headers or just Content-Disposition which appears to be the only place this
> issue occurs).
>
> Does anyone have an example of how to do such a modification (or something
> close enough, such as using a stream parser to make a copy of the original
> message)?
>
> Thanks in advance!
>