You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@servicemix.apache.org by RNobrega <rn...@epo.org> on 2006/12/15 10:00:21 UTC

Some comments on Ftp...Endpoint

We have written an ftp component organised around three ProviderEndpoints:
FTPGet, FTPPut and FTPDir. All accept an xml message like

<ftp>
  <request>
     <url>...</url>
  </request>
</ftp>

FTPGet fetches the file specified in url, FTPPut uploads to the url (both
via
a marshaler), and FTPDir returns xml with a directory listing of url. We did
this to overcome these (perceived) limitations of servicemix's ftp
components:

1) FTPEndpoint's uri is immutable (or better, can't be taken from the
message)

2) There is no way to just find out what files exist on a remote site,
without
   downloading them

3) FTPPollingEndpoint scales poorly because pollFileOrDirectory(...)  does
not
   distribute the load across nodes: the files are fetched locally and
   sequentially

What we are currently doing to emulate pollFileOrDirectory is to build a
chain
FTPDir -> eip-splitter -> FTPGet

Does this make sense? Does this relate in any way to 
http://www.nabble.com/listeners-in-container-and-MessageExchange-tf2815007s12049.html#a7862744
this message ?
-- 
View this message in context: http://www.nabble.com/Some-comments-on-Ftp...Endpoint-tf2825900s12049.html#a7888193
Sent from the ServiceMix - User mailing list archive at Nabble.com.


Re: Some comments on Ftp...Endpoint

Posted by Guillaume Nodet <gn...@gmail.com>.
On 12/15/06, RNobrega <rn...@epo.org> wrote:
>
>
> gnodet wrote:
> >
> >
> >>
> >> 3) FTPPollingEndpoint scales poorly because
> >> pollFileOrDirectory(...)  does not distribute
> >> the load across nodes: the files are fetched
> >> locally and sequentially
> >
> > I don't follow here.  The directory listing is
> > done by one thread, but all actual file reads and
> > jbi stuff is delegated to the thread pool.
> > There's really no reason why it would not scale.
> > Furthermore, the recent changes I made should
> > allow clustering ftp poller endpoints -- provided
> > that we implement a distributed locking mechanism
> > ;)
> >
> >
>
> What you say is true, within one jvm; imagine you
> have 4 separate nodes and 100 files to
> download. Node 1 will be stressed out, while the
> other ones are idle (correct?)
>

If you deploy the same poller endpoint on multiple JVMs,
all will regularly poll the directories for files to process.
All nodes will queue one job for each file to process, but
the locking mechanism will prevent several nodes to handle
the same file, while allowing several nodes to process
different files.  However, I agree that there is no load-balancing,
as one node may start downloading all the files, while the other
nodes do nothing.  Well, I guess this could be tuned by configuring
the FTPClientPool or Executor used to allow only a limited number
of concurrent connections (I think that the default for the FTPClientPool
is 8).
I'm not quite sure how to handle that in the case where we use the
new active polling endpoint.  If you have a clustered quartz timer
(this would ensure that the timer is fired only once in a cluster)
which list the available files, and send exchanges to download them,
these exchanges may be put in a jms queue (using a jms BC or jms/jca
flow) to achieve load-balancing.  However, I don't know how ActiveMQ
behaves when load-balancing a small number of messages.  This need
some tuning I guess.

>
> gnodet wrote:
> >
> >
> > I guess the FTPDir could be triggered by a quartz
> > component.  However, this won't be easily
> > clusterable: if you put the same FTPDir component
> > on several nodes, all of them will list the
> > available files and start downloading them, so
> > imho, it won't solve this problem.  In both cases,
> > the only way to solve the problem (imho) is to use
> > a distributed store based (on top of a database
> > for example) or a DUP remover (which would be a
> > good idea to implement for other use cases too).
> >
> > Anyway, this is an interesting way and we have
> > already discussed in other threads.  (see [1] for
> > example).  However, I'd like this service to offer
> > a WSDL description of its operations (list files,
> > upload, download) which would be independant of
> > the ftp protocol, so that it can be implemented by
> > other services (file, webdav, etc...).
> >
> > What do you think ?
> >
>
> I agree 100%, and would very much like to see
> servicemix evolve in the direction I consider
> (perhaps wrongly) the best one for an esb: jbi
> message based services with great support for:
>
> clustering/fail-over and
>
> configuration (some generic way of using xbean.xml
>    for defaults, and then using a datastore to
>    override properties)
>

Cool !  Let's design and code :)

>
> --
> View this message in context: http://www.nabble.com/Some-comments-on-Ftp...Endpoint-tf2825900s12049.html#a7891991
> Sent from the ServiceMix - User mailing list archive at Nabble.com.
>
>


-- 
Cheers,
Guillaume Nodet

Re: Some comments on Ftp...Endpoint

Posted by RNobrega <rn...@epo.org>.

gnodet wrote:
> 
> 
>>
>> 3) FTPPollingEndpoint scales poorly because
>> pollFileOrDirectory(...)  does not distribute
>> the load across nodes: the files are fetched
>> locally and sequentially
> 
> I don't follow here.  The directory listing is
> done by one thread, but all actual file reads and
> jbi stuff is delegated to the thread pool.
> There's really no reason why it would not scale.
> Furthermore, the recent changes I made should
> allow clustering ftp poller endpoints -- provided
> that we implement a distributed locking mechanism
> ;)
> 
> 

What you say is true, within one jvm; imagine you
have 4 separate nodes and 100 files to
download. Node 1 will be stressed out, while the
other ones are idle (correct?)


gnodet wrote:
> 
> 
> I guess the FTPDir could be triggered by a quartz
> component.  However, this won't be easily
> clusterable: if you put the same FTPDir component
> on several nodes, all of them will list the
> available files and start downloading them, so
> imho, it won't solve this problem.  In both cases,
> the only way to solve the problem (imho) is to use
> a distributed store based (on top of a database
> for example) or a DUP remover (which would be a
> good idea to implement for other use cases too).
> 
> Anyway, this is an interesting way and we have
> already discussed in other threads.  (see [1] for
> example).  However, I'd like this service to offer
> a WSDL description of its operations (list files,
> upload, download) which would be independant of
> the ftp protocol, so that it can be implemented by
> other services (file, webdav, etc...).
> 
> What do you think ?
> 

I agree 100%, and would very much like to see
servicemix evolve in the direction I consider
(perhaps wrongly) the best one for an esb: jbi
message based services with great support for:

clustering/fail-over and

configuration (some generic way of using xbean.xml
   for defaults, and then using a datastore to
   override properties)


-- 
View this message in context: http://www.nabble.com/Some-comments-on-Ftp...Endpoint-tf2825900s12049.html#a7891991
Sent from the ServiceMix - User mailing list archive at Nabble.com.


Re: Some comments on Ftp...Endpoint

Posted by Guillaume Nodet <gn...@gmail.com>.
On 12/15/06, RNobrega <rn...@epo.org> wrote:
>
> We have written an ftp component organised around three ProviderEndpoints:
> FTPGet, FTPPut and FTPDir. All accept an xml message like
>
> <ftp>
>   <request>
>      <url>...</url>
>   </request>
> </ftp>
>
> FTPGet fetches the file specified in url, FTPPut uploads to the url (both
> via
> a marshaler), and FTPDir returns xml with a directory listing of url. We did
> this to overcome these (perceived) limitations of servicemix's ftp
> components:
>
> 1) FTPEndpoint's uri is immutable (or better, can't be taken from the
> message)

This could be enhanced easily.

>
> 2) There is no way to just find out what files exist on a remote site,
> without
>    downloading them

True

>
> 3) FTPPollingEndpoint scales poorly because pollFileOrDirectory(...)  does
> not
>    distribute the load across nodes: the files are fetched locally and
>    sequentially

I don't follow here.  The directory listing is done by one thread, but
all actual
file reads and jbi stuff is delegated to the thread pool.  There's
really no reason
why it would not scale.  Furthermore, the recent changes I made should
allow clustering ftp poller endpoints -- provided that we implement a
distributed
locking mechanism ;)

>
> What we are currently doing to emulate pollFileOrDirectory is to build a
> chain
> FTPDir -> eip-splitter -> FTPGet
>
> Does this make sense? Does this relate in any way to
> http://www.nabble.com/listeners-in-container-and-MessageExchange-tf2815007s12049.html#a7862744
> this message ?

Of course it makes sense, and this is completely related to the
mentioned thread.
I guess the FTPDir could be triggered by a quartz component.  However,
this won't be
easily clusterable: if you put the same FTPDir component on several nodes, all
of them will list the available files and start downloading them, so
imho, it won't
solve this problem.  In both cases, the only way to solve the problem
(imho) is to
use a distributed store based (on top of a database for example) or a
DUP remover
(which would be a good idea to implement for other use cases too).

Anyway, this is an interesting way and we have already discussed in
other threads.
(see [1] for example).  However, I'd like this service to offer a WSDL
description of its
operations (list files, upload, download) which would be independant
of the ftp protocol,
so that it can be implemented by other services (file, webdav, etc...).

What do you think ?

[1] http://www.nabble.com/servicemix-ftp%3A-FtpPollingEndpoint--tf2540539s12049.html#a7092415


> --
> View this message in context: http://www.nabble.com/Some-comments-on-Ftp...Endpoint-tf2825900s12049.html#a7888193
> Sent from the ServiceMix - User mailing list archive at Nabble.com.
>
>


-- 
Cheers,
Guillaume Nodet