You are viewing a plain text version of this content. The canonical link for it is here.

Posted to server-dev@james.apache.org by Stefano Bagnara <ap...@bago.org> on 2006/01/22 16:18:46 UTC

Memory, Large messages and SharedInputStreams (Was: Apache James / JAMES-134)

Ofir Gross wrote:
> Hi,
> I share my thoughts:

I appreciate this. Sorry If it takes so much to reply but I've been busy 
with the spool synchronization problems and with my day-job.

I'd like to keep the conversation on the dev-list because it is archived 
for future reference and somebody else could eventually partecipate to 
the brainstorming.

>  There is a problem with the constructor:
> 
>    public MimeMessage(MimeMessage source) throws MessagingException {
>         [...]
>         source.writeTo(bos);       <<----------Problem is here
> [...]
> Because it __always__ saves into a ByteArrayInputStream which is in 
> memory. The only way to avoid it is either __not__ to call that 
> constructor, or to subclass it like I did, in order to overide this 
> method. When I have overiden this constructor, I saved to a temporary 
> file instead of a byte array.

Ok, that's clear, we must remove calls to that constructor from our 
code, and we should use new MimeMessage(Session s, InputStream is).
Furthermore we should pass a SharedInputStream to that constructor.

> You suggested to use the existing repository instead, which is a good 
> idea. But two questions:
> 1. how do I locate the message in the repository form the MimeMessage 
> "source"?

I think this is not possible. You can get the source starting from the 
Mail object in the repository but not viceversa.
Why do you need that?

> 2. Does every MimeMessage exist in the repository?

Not every mimemessage, because when we create a new mimemessage from 
scratch (like the bounce mailet) we create the MimeMessage and after 
that we send it to James that store it in the spool.
BTW most time when we deal with big messages we probably received them 
from the SMTPServer and we already have them in streamrepositories or 
mailrepositories.

When the SMTPServer receive a new message it currently create a new 
MailImpl using the "public MailImpl(String name, MailAddress sender, 
Collection recipients, InputStream messageIn)" costructor.
That constructor create a new MimeMessageSource using the InputStream 
from the socket (new MimeMessageInputStreamSource(name, messageIn);)
MimeMessageInputStreamSource currently store the message in a .m64 file 
and eventually provide the stream to the following users:

public synchronized InputStream getInputStream() throws IOException {
    return new BufferedInputStream(new FileInputStream(file));
}

This is the first step.
Maybe we should start changing the MimeMessageInputStreamSource to 
provide a Shared input stream?!?

Then, when a message is stored in a dbfile repository then the header is 
stored in the body field of the repository db table while the body of 
the message is stored in a streamrepository.

MimeMessageJDBCSource is the object that handle this behaviour. So every 
  time we read a message from the db repository we use this object.
We call this method of that object to get the InputStream:
---
public synchronized InputStream getInputStream() throws IOException
---
and it create a new SequenceInputStream using a ByteArrayInputStream of 
the header and a the input stream provided by the streamrepository.get().
---
InputStream in = new ByteArrayInputStream(headers);
if (sr != null) {
    in = new SequenceInputStream(in, sr.get(key));
}
---
The implementation of the streamrepository.get used is in 
File_Persistent_Stream_Repository and you can see it uses a
---
final ResettableFileInputStream stream = new ResettableFileInputStream( 
getFile( key ) );
---

Maybe we should work on the above files to always be able to provide 
shared inputstreams.

> The other constructor: "public MimeMessage(Session session, InputStream 
> is)" will be OK if the InputStream provided to it will be a 
> SharedInputStream, and that can be done by tracking down calls to this 
> constractor (by grep "new MimeMessage"), and wrapping up the provided 
> input stream with a SharedInputStream implementing wrapper. I did that 
> grep and looked at the results, and this looks possible.

The main one is on the MimeMessageWrapper.loadMessage()
----
in = source.getInputStream();
headers = loadHeaders(in);

ByteArrayInputStream headersIn = new 
ByteArrayInputStream(headers.toByteArray());
in = new SequenceInputStream(headersIn, in);

message = new MimeMessage(session, in);
----
source is the MimeMessageJDBCSource we already seen before and we could 
change it to provide a SharedInputStream (to avoid a copy to a new file)
You see a similar pattern as before: we create a new SequenceInputStream 
from a ByteArrayInputStream and the stream provided by the source.
We should find a way to have a SharedInputStream at the end of all this 
steps, without the need to copy the stream on a new file.

We could introduce your proposed wrapped only in a few mailet but we 
should avoid using it in our core because we already have 2 wrappers 
over the mimemessage and we can get better/cleaner result following the 
path I describe in this mail.

> Maybe if all InputStreams provided to that constructor will be 
> SharedInputStream, then all MimeMessage will have a SharedInputStream 
> implementing contentStream as well? If it is true, then the overiding 
> constructor of "MimeMessage(MimeMessage source)" could use that stream, 
> and it will not need to locate the message in the repository. But I am 
> not sure wether it is true, because the constructor "MimeMessage(Session 
> session)" don't touch content, or contentStream, and so if it is 
> constructed that way, it will not have either, and so there will be no 
> InputStream for it, in case it is sent this way to the 
> MimeMessage(MimeMessage source) constructor.

When the message is constructed from someone else using the new 
MimeMessage(session) we have no power on how it is handled.
IMHO we should start optimizing our own operations then we could write a 
few docs on how to write optimized mailets.

A further step would be to use streaming operations also for "db" only 
repositories. We currently use blob operations that write and read full 
bytearrays but most new dbs/jdbc drivers correctly supports the use of 
streams for write and read operations of large contents. Here there 
would be one more issue because they don't provide (obviously) 
SharedInputStreams but their own InputStreams and I don't know how we 
could implement it. We probably should use your proposed wrapper if we 
need that, or otherwise create a wrapper over the previous InputStream 
implementing the SharedInputStream and simply retreaving a new 
InputStream at every call to newStream (returning, in turn, a new 
SharedInputStream).

The worst part of this work is that it is very difficult to unit-test 
memory issues: any idea on how to test it?

Stefano

---------------------------------------------------------------------
To unsubscribe, e-mail: server-dev-unsubscribe@james.apache.org
For additional commands, e-mail: server-dev-help@james.apache.org

Re: Memory, Large messages and SharedInputStreams (Was: Apache James / JAMES-134)

Posted by Markus Kühn <ty...@hotmail.com>.

----- Original Message ----- 
From: "Stefano Bagnara" <ap...@bago.org>
To: "James Developers List" <se...@james.apache.org>
Cc: "Ofir Gross" <of...@hotmail.com>
Sent: Sunday, January 22, 2006 4:18 PM
Subject: Memory, Large messages and SharedInputStreams (Was: Apache James /
JAMES-134)

> When the SMTPServer receive a new message it currently create a new
> MailImpl using the "public MailImpl(String name, MailAddress sender,
> Collection recipients, InputStream messageIn)" costructor.
> That constructor create a new MimeMessageSource using the InputStream from
> the socket (new MimeMessageInputStreamSource(name, messageIn);)
> MimeMessageInputStreamSource currently store the message in a .m64 file
> and eventually provide the stream to the following users:
>
> public synchronized InputStream getInputStream() throws IOException {
>    return new BufferedInputStream(new FileInputStream(file));
> }
>
> This is the first step.
> Maybe we should start changing the MimeMessageInputStreamSource to provide
> a Shared input stream?!?
>
> Then, when a message is stored in a dbfile repository then the header is
> stored in the body field of the repository db table while the body of the
> message is stored in a streamrepository.

This is exactly what I do at the moment.
MimeMessageInputStreamSource.getInputStream() returns a
SharedFileInputStream reading from the .m64 file.

What I also do is using the dbfile repository by default, storing the body
in the file-rep if the size is above 5 MB or else together with the header
in the db.

Storing the body in the file-rep is a simple file-move-operation. The
problem with the SharedFileInputStream is, that as soon you create it one
can not move the file anymore, because of the open stream. So everytime I
move the .m64 into the file-rep I have to close the SharedInputStream and
reopen it afterwards.

This way the changes affect MimeMessageInputStreamSource and
MimeMessageWrapper (as far I can remember). This works in my version of 
JAMES but not in the
official 2.1.3 (tested 6 month ago) mine is based on. There are other places 
in 2.1.3 where
the Message is loaded into memory, but I didn't any further investigation.

Before trying this I eliminated each call to
MimeMessageWrapper.loadMessage() when I rewrote JAMES in 2002.

The only time the message needs to be loaded  is when its content is
changed. Though the javamail docu states that saveChanges() should be
called after a MimeMessage has been changed, doing so only results in
loading the Message into memory (without SharedInputStream) in order to 
ensure the right encoding. But
if you know what encoding to apply to your changes (probably none)
there's no need to call it.

But using SharedInputStream is the way to go.

Markus 

---------------------------------------------------------------------
To unsubscribe, e-mail: server-dev-unsubscribe@james.apache.org
For additional commands, e-mail: server-dev-help@james.apache.org