You are viewing a plain text version of this content. The canonical link for it is here.

Posted to mime4j-dev@james.apache.org by Lasse Lindgård <la...@lldata.dk> on 2020/10/01 13:42:48 UTC

Byte offsets for attachments

Hi mime4j experts,

I got a requirement to deliver emails to a legacy system that needs to read
the attachments.

For each part in a multipart email I need to provide the byte offset for
where the attachment starts in the email, so the legacy system doesn't need
to know how to parse emails.

Performance and memory usage is an issue, so the solution can't load the
entire email into memory, so I use the MimeTokenStream

My intuition tells me that if mime4j can return an InputStream for the
current attachment body. Getting the byte offset should be easy. But byte
offsets are not tracked. The library seems to be aware of line numbers
internally, but they are not exposed and I suspect they are "off" because
the streams are buffered.

Now I am thinking that maybe I am going about this the wrong way. So I
would very much appreciate any ideas of how to solve this in a simple
matter.

NB: I also posted a more open version of this question on SO:
https://stackoverflow.com/questions/64155766/find-byte-offsets-for-e-mail-attachments

Re: Byte offsets for attachments

Posted by Lasse Lindgård <la...@lldata.dk>.

Yes. The legacy system handles encoding. That is provided as another
parameter.

It is an existing, working integration. I am just porting the old system in
my end. And I'd rather use mime4j than porting the kitchen sink mail parser
🤢

It looks like getting the byte offset for the attachment body is the only
missing feature.



fre. d. 2. okt. 2020 03.54 skrev Tellier Benoit <bt...@apache.org>:

> Hi,
>
> First, be aware that by just getting the offset, the attachment will
> still be encoded. Is it something your legacy system can tolerate?
>
>
> Le 01/10/2020 à 20:42, Lasse Lindgård a écrit :
> > Hi mime4j experts,
> >
> > I got a requirement to deliver emails to a legacy system that needs to
> read
> > the attachments.
> >
> > For each part in a multipart email I need to provide the byte offset for
> > where the attachment starts in the email, so the legacy system doesn't
> need
> > to know how to parse emails.
> >
> > Performance and memory usage is an issue, so the solution can't load the
> > entire email into memory, so I use the MimeTokenStream
> >
> > My intuition tells me that if mime4j can return an InputStream for the
> > current attachment body. Getting the byte offset should be easy. But byte
> > offsets are not tracked. The library seems to be aware of line numbers
> > internally, but they are not exposed and I suspect they are "off" because
> > the streams are buffered.
> I miss knowledge to answer here.
> >
> > Now I am thinking that maybe I am going about this the wrong way. So I
> > would very much appreciate any ideas of how to solve this in a simple
> > matter.
> >
> > NB: I also posted a more open version of this question on SO:
> >
> https://stackoverflow.com/questions/64155766/find-byte-offsets-for-e-mail-attachments
> >
>

Re: Byte offsets for attachments

Posted by Tellier Benoit <bt...@apache.org>.

Hi,

First, be aware that by just getting the offset, the attachment will
still be encoded. Is it something your legacy system can tolerate?


Le 01/10/2020 à 20:42, Lasse Lindgård a écrit :
> Hi mime4j experts,
>
> I got a requirement to deliver emails to a legacy system that needs to read
> the attachments.
>
> For each part in a multipart email I need to provide the byte offset for
> where the attachment starts in the email, so the legacy system doesn't need
> to know how to parse emails.
>
> Performance and memory usage is an issue, so the solution can't load the
> entire email into memory, so I use the MimeTokenStream
>
> My intuition tells me that if mime4j can return an InputStream for the
> current attachment body. Getting the byte offset should be easy. But byte
> offsets are not tracked. The library seems to be aware of line numbers
> internally, but they are not exposed and I suspect they are "off" because
> the streams are buffered.
I miss knowledge to answer here.
>
> Now I am thinking that maybe I am going about this the wrong way. So I
> would very much appreciate any ideas of how to solve this in a simple
> matter.
>
> NB: I also posted a more open version of this question on SO:
> https://stackoverflow.com/questions/64155766/find-byte-offsets-for-e-mail-attachments
>