You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@activemq.apache.org by Aleksi Kallio <al...@csc.fi> on 2007/02/14 14:30:43 UTC

FileMessage: we would like to contribute

There has been discussion regarding FileMessage or something similar: a 
message for transferring large amounts of data.

For background, see:
http://issues.apache.org/activemq/browse/AMQ-1075

We need something that offers better service than streaming, of course 
by building on top of it. That is why we would be interested in this and 
ready to contribute our effort into making it happen.

We have to be capable of moving BIG files over JMS, in the order of 
couple of gigabytes. Normally they are a lot smaller, but also the big 
ones must be handled.

Features/fixes we are looking for are:

1) streaming requires dedicated destination, which makes things complicated
2) resuming transfers
3) easy monitoring of progress


 From core developers we would appreciate feedback on how to proceed 
with developing such a feature. From rest of the communinity we would 
gladly get feedback on how you would like this to be done and what kind 
of API would be the best.


In the spirit of making ActiveMQ even better,

--
Aleksi Kallio, Application Architect, Scientific Software Development
P.O. BOX 405, 02101 Espoo, Finland, Tel +358 9 457 2297
CSC is the Finnish IT center for science, www.csc.fi
e-mail: aleksi.kallio@csc.fi

Re: [Spam: 5.0] Re: FileMessage: we would like to contribute

Posted by Hiram Chirino <hi...@hiramchirino.com>.

I personally like options 2 and 3.

The message just contains a URL to where the whole stream can be
downloaded.  The nice part about this is that then you have many
option for where your store the message.  You make the URL a reference
to the producer.  And a consumer would just connect back to the
producer to get the stream.  The producer could first upload the
message to the broker, and then pass a URL to the broker, and the
consumer would connect back to the broker for the stream.  You could
upload the file to and HTTP server or even something like S3 and have
the consumer connect to that for the stream.  In all these cases the
only thing that varies is where the producer uploads the data to.  The
message and the consumer behave the same way in all the cases.

I think that the simplest case to implement which will provide the
most robust operation is to have the producer upload the stream to the
broker and pass a reference to the broker's blob and have clients
connect back to it.

On 2/22/07, Aleksi Kallio <al...@csc.fi> wrote:
>
> I'll move this discussion over to dev@activemq.apache.org as it belongs
> there...
>
> Below I go on sketching how FileMessage might work. As you can see it is
> still quite sketchy in some parts so all comments and good ideas are
> welcome.
>
> Btw. is FileMessage a good name? In some cases we are not transferring
> files, but data in memory. Basically we are transferring a finite byte
> sequence. Is ByteSequenceMessage too awkward? I think it is maybe...
>
> > There are a few different possible implementations...
> >
> > (i) FileMessage ends up being a wrapper on top of the existing JMS streams
>
> This would be the preferred method in our case.
>
> Current streaming API uses a designated destination for the stream. If
> that destination is used for other purposes (other streams or messages)
> there has to be a way of separating current stream from the rest of the
> traffic.
>
> Streams support message selectors, which looks like the best solution.
> The problem is that if there are non-FileMessages sent to that
> destination also and listeners for them, they have to also use selectors
> to weed out FileMessage streams.
>
> It would be great if FileMessage would behave just like other types of
> messages. In the streaming case I guess it is not possible to achieve?
>
> > (ii) FileMessage uses some out of band transfer mechanism.
> > (iii) direct connection. This option is similar to (ii) but rather
> > than putting the file on some remote file server, the file stays on
> > the producer until the consumer has received it;
> >
> > The nice thing is I think all of these approaches can be handled
> > nicely by the single FileMessage API; then it can be a
> > configuration/policy issue as to exactly which implementation is used.
>
> If we look at the three implementations and what they would require from
> the two endpoints:
>
> 1. producer must inform what selector is to be used to receive data
> through JMS stream, consumers must acknowledge when they are ready to
> receive data
> 2. producer must place file available to external server and place URL
> to the message
> 3. producer must open a port and place URL to the message
>
> .. we see that streaming is maybe the trickiest one. If producer just
> starts streaming there is no guarantee (in the general case) that
> consumer will receive the whole stream from the beginning. Is that correct?
>
> In case 2, there is the question about pruning the files. Are they left
> to external server for ever? Or should consumer be capable of confirming
> the transfer, and after confirmation producer would remove the file? I
> don't think we should assume the consumer has a write access to file server.
>
> Case 3 is actually pretty straightforward. Do we want allow also a push
> option where consumer opens the port and producer delivers the file?
>
> > In terms of getting started, the simplest route is probably to add the
> > API in first (to start with assuming just a URL to the file which is a
> > no brainer) then we should be able to start adding different providers
> > to suit.
>
> Yes, I think that's the best way to go forward. I'll write something
> based on that JIRA issue and send it to dev@activemq.apache.org for
> comments. Does that sound good?
>
>


-- 
Regards,
Hiram

Blog: http://hiramchirino.com

Re: [Spam: 5.0] Re: FileMessage: we would like to contribute

Posted by Aleksi Kallio <al...@csc.fi>.

I'll move this discussion over to dev@activemq.apache.org as it belongs 
there...

Below I go on sketching how FileMessage might work. As you can see it is 
still quite sketchy in some parts so all comments and good ideas are 
welcome.

Btw. is FileMessage a good name? In some cases we are not transferring 
files, but data in memory. Basically we are transferring a finite byte 
sequence. Is ByteSequenceMessage too awkward? I think it is maybe...

> There are a few different possible implementations...
> 
> (i) FileMessage ends up being a wrapper on top of the existing JMS streams

This would be the preferred method in our case.

Current streaming API uses a designated destination for the stream. If 
that destination is used for other purposes (other streams or messages) 
there has to be a way of separating current stream from the rest of the 
traffic.

Streams support message selectors, which looks like the best solution. 
The problem is that if there are non-FileMessages sent to that 
destination also and listeners for them, they have to also use selectors 
to weed out FileMessage streams.

It would be great if FileMessage would behave just like other types of 
messages. In the streaming case I guess it is not possible to achieve?

> (ii) FileMessage uses some out of band transfer mechanism.  
> (iii) direct connection. This option is similar to (ii) but rather
> than putting the file on some remote file server, the file stays on
> the producer until the consumer has received it; 
> 
> The nice thing is I think all of these approaches can be handled
> nicely by the single FileMessage API; then it can be a
> configuration/policy issue as to exactly which implementation is used.

If we look at the three implementations and what they would require from 
the two endpoints:

1. producer must inform what selector is to be used to receive data 
through JMS stream, consumers must acknowledge when they are ready to 
receive data
2. producer must place file available to external server and place URL 
to the message
3. producer must open a port and place URL to the message

.. we see that streaming is maybe the trickiest one. If producer just 
starts streaming there is no guarantee (in the general case) that 
consumer will receive the whole stream from the beginning. Is that correct?

In case 2, there is the question about pruning the files. Are they left 
to external server for ever? Or should consumer be capable of confirming 
the transfer, and after confirmation producer would remove the file? I 
don't think we should assume the consumer has a write access to file server.

Case 3 is actually pretty straightforward. Do we want allow also a push 
option where consumer opens the port and producer delivers the file?

> In terms of getting started, the simplest route is probably to add the
> API in first (to start with assuming just a URL to the file which is a
> no brainer) then we should be able to start adding different providers
> to suit.

Yes, I think that's the best way to go forward. I'll write something 
based on that JIRA issue and send it to dev@activemq.apache.org for 
comments. Does that sound good?

Re: [Spam: 5.0] Re: FileMessage: we would like to contribute

Posted by Aleksi Kallio <al...@csc.fi>.

I'll move this discussion over to dev@activemq.apache.org as it belongs 
there...

Below I go on sketching how FileMessage might work. As you can see it is 
still quite sketchy in some parts so all comments and good ideas are 
welcome.

Btw. is FileMessage a good name? In some cases we are not transferring 
files, but data in memory. Basically we are transferring a finite byte 
sequence. Is ByteSequenceMessage too awkward? I think it is maybe...

> There are a few different possible implementations...
> 
> (i) FileMessage ends up being a wrapper on top of the existing JMS streams

This would be the preferred method in our case.

Current streaming API uses a designated destination for the stream. If 
that destination is used for other purposes (other streams or messages) 
there has to be a way of separating current stream from the rest of the 
traffic.

Streams support message selectors, which looks like the best solution. 
The problem is that if there are non-FileMessages sent to that 
destination also and listeners for them, they have to also use selectors 
to weed out FileMessage streams.

It would be great if FileMessage would behave just like other types of 
messages. In the streaming case I guess it is not possible to achieve?

> (ii) FileMessage uses some out of band transfer mechanism.  
> (iii) direct connection. This option is similar to (ii) but rather
> than putting the file on some remote file server, the file stays on
> the producer until the consumer has received it; 
> 
> The nice thing is I think all of these approaches can be handled
> nicely by the single FileMessage API; then it can be a
> configuration/policy issue as to exactly which implementation is used.

If we look at the three implementations and what they would require from 
the two endpoints:

1. producer must inform what selector is to be used to receive data 
through JMS stream, consumers must acknowledge when they are ready to 
receive data
2. producer must place file available to external server and place URL 
to the message
3. producer must open a port and place URL to the message

.. we see that streaming is maybe the trickiest one. If producer just 
starts streaming there is no guarantee (in the general case) that 
consumer will receive the whole stream from the beginning. Is that correct?

In case 2, there is the question about pruning the files. Are they left 
to external server for ever? Or should consumer be capable of confirming 
the transfer, and after confirmation producer would remove the file? I 
don't think we should assume the consumer has a write access to file server.

Case 3 is actually pretty straightforward. Do we want allow also a push 
option where consumer opens the port and producer delivers the file?

> In terms of getting started, the simplest route is probably to add the
> API in first (to start with assuming just a URL to the file which is a
> no brainer) then we should be able to start adding different providers
> to suit.

Yes, I think that's the best way to go forward. I'll write something 
based on that JIRA issue and send it to dev@activemq.apache.org for 
comments. Does that sound good?

Re: FileMessage: we would like to contribute

Posted by James Strachan <ja...@gmail.com>.

On 2/14/07, Aleksi Kallio <al...@csc.fi> wrote:
>
> There has been discussion regarding FileMessage or something similar: a
> message for transferring large amounts of data.
>
> For background, see:
> http://issues.apache.org/activemq/browse/AMQ-1075
>
> We need something that offers better service than streaming, of course
> by building on top of it. That is why we would be interested in this and
> ready to contribute our effort into making it happen.
>
> We have to be capable of moving BIG files over JMS, in the order of
> couple of gigabytes. Normally they are a lot smaller, but also the big
> ones must be handled.
>
> Features/fixes we are looking for are:
>
> 1) streaming requires dedicated destination, which makes things complicated
> 2) resuming transfers
> 3) easy monitoring of progress
>
>
>  From core developers we would appreciate feedback on how to proceed
> with developing such a feature. From rest of the communinity we would
> gladly get feedback on how you would like this to be done and what kind
> of API would be the best.
>
>
> In the spirit of making ActiveMQ even better,

Sounds great! Welcome!

The comments on the JIRA issue describe the kind of API I was thinking of...
http://issues.apache.org/activemq/browse/AMQ-1075

There are a few different possible implementations...

(i) FileMessage ends up being a wrapper on top of the existing JMS streams
http://activemq.apache.org/jms-streams.html

(ii) FileMessage uses some out of band transfer mechanism. e.g. the
FileMessage essentially just carries around a URL; then the file is
transferred using some FTP/HTTP site. e.g. the producer sends to the
remote file/web server, then sends the message with a reference to it.
i.e. using an out-of-band transfer mechanism. The benefit of this is
that existing file/web servers can be used for the actual file
transfer piece, then the message broker can be used for reliable
message load balancing and failover. i.e. making sure exactly one
consumer processes the message properly, in a transactional way etc.

(iii) direct connection. This option is similar to (ii) but rather
than putting the file on some remote file server, the file stays on
the producer until the consumer has received it; also the producer
listens on a socket for the consumer. Then the FileMessage includes a
URL which when opened on the consumer will essentially connect
directly to the producer of the file. So the FileMessage is a normal
JMS message sent around the JMS network (over clusters & network store
& forward etc). Then when its finally received by a consumer, the
consumer opens the URL which in effect causes the consumer to connect
directly to the producer, to then stream the file.

All of these options have different strengths and weaknesses. I think
for many use cases (iii) is good but I could see folks liking (i) and
(ii) as well.

The nice thing is I think all of these approaches can be handled
nicely by the single FileMessage API; then it can be a
configuration/policy issue as to exactly which implementation is used.

In terms of getting started, the simplest route is probably to add the
API in first (to start with assuming just a URL to the file which is a
no brainer) then we should be able to start adding different providers
to suit.

BTW we might also want to support senders providing a File rather than
an InputStream if you want to be able to monitor progress (as with a
stream you've no idea how long its gonna be)

-- 

James
-------
http://radio.weblogs.com/0112098/