You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@airavata.apache.org by Ameya Advankar <aa...@umail.iu.edu> on 2017/05/08 04:23:14 UTC

[#Spring17-Airavata-Courses] Reliable file uploads for Science Gateway Portals

Hi Airavata Developers,

I have been exploring and evaluating tus.io  as a solution to the following
problems encountered in Science Gateway Portals related to the file upload
functionality:

*1. Unreliable HTTP Connection*


Since the File uploads in Science Gateway Portals are HTTP uploads, these
are heavily reliant on a continuous internet connection being available on
the client machine. There could be network disruptions or connectivity
issues and a traditional file upload will fail in this case. As a result,
the users may have to retry the uploads manually and wait for a successful
upload to take place. If the files are large i.e. a few hundred Megabytes
or some Gigabytes, this will cause a waste of bandwidth and time.


*2. Space constraints on the Server*


The file which is being uploaded usually would be staged somewhere on the
Server for a certain period of time till it is picked up for further
processing. The file may be there for a considerable amount of time
depending on the process queuing time. In case multiple large files are
uploaded at the same time by users of a Portal, the host machine may run
out of space and this could have adverse affects on the performance of the
Portal.


Also, there could be cases in which multiple Science Gateway Portals are
hosted on a the same web server. In such a setting, if a particular Portal
fills up server space with multiple large files, it may affect the
performance of other Portals residing on that web server as well. In short,
the file-upload functionality should not affect the Portal performance.



We could use some JavaScript libraries such as Fine Uploader
<https://fineuploader.com/>, Resumable.js <http://www.resumablejs.com>  or
flow.js <https://github.com/flowjs/flow.js> which provide a simple client
side library for resumable file uploads in case of disruptions. Fine
Uploader seems to be the best among these libraries based on the community
usage and contribution on Github. For each of these client side libraries,
we have to incorporate the corresponding server side code to handle
resumable uploads.

However since their JavaScript implementations are unique, with each
library using their own set of parameters and request headers to achieve
Resumable functionality, the Server implementation which we adopt will be
tightly coupled with the library we choose. This will introduce a
dependency on the library.
To remove the library based dependency, we can use a client-server
implementation of tus.io protocol. Using a protocol will reduce the library
level dependency to a protocol level dependency.

Also since tus.io client could be implemented in any language, we could
have multiple types of Gateway Portals such as Web, Desktop and native
which connect to the same tus.io based server.

The Second problem of space constraint can be solved by separating the
file-upload process as a micro service located on a separate host. Each
Portal could have its own separate micro-service instance and this way the
file-upload functionality will not hamper the Portal performance. Further,
the micro-service will have to be secured and tus.io allows us to do this
via the tus.io hooks
<https://github.com/tus/tusd/blob/master/docs/hooks.md> feature
by implementing the Auth code in the *pre-create* hook.

Thus, tus.io seems to be a good, flexible and maintainable solution for
implementing file-upload functionality in Science Gateway Portals from a
long term perspective.

Thanks & Regards,
Ameya Advankar
Masters in Computer Science,
Indiana University Bloomington

Re: [#Spring17-Airavata-Courses] Reliable file uploads for Science Gateway Portals

Posted by Ameya Advankar <aa...@umail.iu.edu>.
Hi Terri,

I had previously taken a look at your tus servlet implementation while
browsing through the tus community implementations.

I will start on a proof of concept for a system where the tus.io servlet is
residing on a remote server as a standalone application, and the file
uploads will use the same authentication as that of the portal. This
standalone server instance can then be modified to handle two separate
portal requests.

Thanks,
Ameya Advankar

On Mon, May 8, 2017 at 11:58 AM, Schwartz, Terri <te...@sdsc.edu> wrote:

> I made a java servlet implementation of the server side of the tus
> protocol: https://github.com/terrischwartz/tus_servlet
>
> I did this for ease of integration with the cipres portal.  In particular,
> I wanted users who were logged into the portal to be able to upload via the
> tus protocol, and didn't want to have to use anything other than the
> existing login session for authentication.    I experimented with using it
> for upload from cipres but we don't have a production ready UI in place yet.
>
> I wanted to extend it to function as a standalone server that could be
> used by multiple applications but never worked out the authentication for
> that.
>
> I looked at the same javascript libraries you did before coming across
> tus.io, and wasn't too impressed with them.  The tus.io core developers
> are very enthusiastic, friendly and professional.  They know what they're
> doing and are great to work with.   As you've probably seen, the protocol
> is super simple and does the trick.
>
> Terri
> ------------------------------
> *From:* Miller, Mark [mmiller@sdsc.edu]
> *Sent:* Monday, May 08, 2017 6:08 AM
> *To:* dev@airavata.apache.org
> *Subject:* RE: [#Spring17-Airavata-Courses] Reliable file uploads for
> Science Gateway Portals
>
> Hi Ameya,
>
>
>
> We have thought about this at CIPRES as well.
>
> Terri Schwartz in our group has implemented a Java version of resumable
> file transfer in tus.io, and I have mentioned this as a possible
> SciGaP/Airavata service. I would talk recommend you talk with Terri and see
> if there is synergy between your thinking, and what she has done.
>
>
>
> Mark
>
>
>
> *From:* Ameya Advankar [mailto:aadvanka@umail.iu.edu]
> *Sent:* Sunday, May 07, 2017 9:23 PM
> *To:* dev@airavata.apache.org
> *Subject:* [#Spring17-Airavata-Courses] Reliable file uploads for Science
> Gateway Portals
>
>
>
> Hi Airavata Developers,
>
>
>
> I have been exploring and evaluating tus.io  as a solution to the
> following problems encountered in Science Gateway Portals related to the
> file upload functionality:
>
>
>
> *1. Unreliable HTTP Connection*
>
>
>
> Since the File uploads in Science Gateway Portals are HTTP uploads, these
> are heavily reliant on a continuous internet connection being available on
> the client machine. There could be network disruptions or connectivity
> issues and a traditional file upload will fail in this case. As a result,
> the users may have to retry the uploads manually and wait for a successful
> upload to take place. If the files are large i.e. a few hundred Megabytes
> or some Gigabytes, this will cause a waste of bandwidth and time.
>
>
>
> *2. Space constraints on the Server*
>
>
>
> The file which is being uploaded usually would be staged somewhere on the
> Server for a certain period of time till it is picked up for further
> processing. The file may be there for a considerable amount of time
> depending on the process queuing time. In case multiple large files are
> uploaded at the same time by users of a Portal, the host machine may run
> out of space and this could have adverse affects on the performance of the
> Portal.
>
>
>
> Also, there could be cases in which multiple Science Gateway Portals are
> hosted on a the same web server. In such a setting, if a particular Portal
> fills up server space with multiple large files, it may affect the
> performance of other Portals residing on that web server as well. In short,
> the file-upload functionality should not affect the Portal performance.
>
>
>
>
>
> We could use some JavaScript libraries such as Fine Uploader
> <https://fineuploader.com/>, Resumable.js <http://www.resumablejs.com>
> or flow.js <https://github.com/flowjs/flow.js> which provide a simple
> client side library for resumable file uploads in case of disruptions. Fine
> Uploader seems to be the best among these libraries based on the community
> usage and contribution on Github. For each of these client side libraries,
> we have to incorporate the corresponding server side code to handle
> resumable uploads.
>
>
>
> However since their JavaScript implementations are unique, with each
> library using their own set of parameters and request headers to achieve
> Resumable functionality, the Server implementation which we adopt will be
> tightly coupled with the library we choose. This will introduce a
> dependency on the library.
>
> To remove the library based dependency, we can use a client-server
> implementation of tus.io protocol. Using a protocol will reduce the
> library level dependency to a protocol level dependency.
>
>
>
> Also since tus.io client could be implemented in any language, we could
> have multiple types of Gateway Portals such as Web, Desktop and native
> which connect to the same tus.io based server.
>
>
>
> The Second problem of space constraint can be solved by separating the
> file-upload process as a micro service located on a separate host. Each
> Portal could have its own separate micro-service instance and this way the
> file-upload functionality will not hamper the Portal performance. Further,
> the micro-service will have to be secured and tus.io allows us to do this
> via the tus.io hooks
> <https://github.com/tus/tusd/blob/master/docs/hooks.md> feature by
> implementing the Auth code in the *pre-create* hook.
>
>
>
> Thus, tus.io seems to be a good, flexible and maintainable solution for
> implementing file-upload functionality in Science Gateway Portals from a
> long term perspective.
>
>
>
> Thanks & Regards,
>
> Ameya Advankar
>
> Masters in Computer Science,
>
> Indiana University Bloomington
>

RE: [#Spring17-Airavata-Courses] Reliable file uploads for Science Gateway Portals

Posted by "Schwartz, Terri" <te...@sdsc.edu>.
I made a java servlet implementation of the server side of the tus protocol: https://github.com/terrischwartz/tus_servlet

I did this for ease of integration with the cipres portal.  In particular, I wanted users who were logged into the portal to be able to upload via the tus protocol, and didn't want to have to use anything other than the existing login session for authentication.    I experimented with using it for upload from cipres but we don't have a production ready UI in place yet.

I wanted to extend it to function as a standalone server that could be used by multiple applications but never worked out the authentication for that.

I looked at the same javascript libraries you did before coming across tus.io, and wasn't too impressed with them.  The tus.io core developers are very enthusiastic, friendly and professional.  They know what they're doing and are great to work with.   As you've probably seen, the protocol is super simple and does the trick.

Terri
________________________________
From: Miller, Mark [mmiller@sdsc.edu]
Sent: Monday, May 08, 2017 6:08 AM
To: dev@airavata.apache.org
Subject: RE: [#Spring17-Airavata-Courses] Reliable file uploads for Science Gateway Portals

Hi Ameya,

We have thought about this at CIPRES as well.
Terri Schwartz in our group has implemented a Java version of resumable file transfer in tus.io, and I have mentioned this as a possible SciGaP/Airavata service. I would talk recommend you talk with Terri and see if there is synergy between your thinking, and what she has done.

Mark

From: Ameya Advankar [mailto:aadvanka@umail.iu.edu]
Sent: Sunday, May 07, 2017 9:23 PM
To: dev@airavata.apache.org
Subject: [#Spring17-Airavata-Courses] Reliable file uploads for Science Gateway Portals

Hi Airavata Developers,

I have been exploring and evaluating tus.io<http://tus.io/>  as a solution to the following problems encountered in Science Gateway Portals related to the file upload functionality:

1. Unreliable HTTP Connection

Since the File uploads in Science Gateway Portals are HTTP uploads, these are heavily reliant on a continuous internet connection being available on the client machine. There could be network disruptions or connectivity issues and a traditional file upload will fail in this case. As a result, the users may have to retry the uploads manually and wait for a successful upload to take place. If the files are large i.e. a few hundred Megabytes or some Gigabytes, this will cause a waste of bandwidth and time.

2. Space constraints on the Server

The file which is being uploaded usually would be staged somewhere on the Server for a certain period of time till it is picked up for further processing. The file may be there for a considerable amount of time depending on the process queuing time. In case multiple large files are uploaded at the same time by users of a Portal, the host machine may run out of space and this could have adverse affects on the performance of the Portal.

Also, there could be cases in which multiple Science Gateway Portals are hosted on a the same web server. In such a setting, if a particular Portal fills up server space with multiple large files, it may affect the performance of other Portals residing on that web server as well. In short, the file-upload functionality should not affect the Portal performance.


We could use some JavaScript libraries such as Fine Uploader<https://fineuploader.com/>, Resumable.js<http://www.resumablejs.com>  or flow.js<https://github.com/flowjs/flow.js> which provide a simple client side library for resumable file uploads in case of disruptions. Fine Uploader seems to be the best among these libraries based on the community usage and contribution on Github. For each of these client side libraries, we have to incorporate the corresponding server side code to handle resumable uploads.

However since their JavaScript implementations are unique, with each library using their own set of parameters and request headers to achieve Resumable functionality, the Server implementation which we adopt will be tightly coupled with the library we choose. This will introduce a dependency on the library.
To remove the library based dependency, we can use a client-server implementation of tus.io<http://tus.io> protocol. Using a protocol will reduce the library level dependency to a protocol level dependency.

Also since tus.io<http://tus.io> client could be implemented in any language, we could have multiple types of Gateway Portals such as Web, Desktop and native which connect to the same tus.io<http://tus.io> based server.

The Second problem of space constraint can be solved by separating the file-upload process as a micro service located on a separate host. Each Portal could have its own separate micro-service instance and this way the file-upload functionality will not hamper the Portal performance. Further, the micro-service will have to be secured and tus.io<http://tus.io> allows us to do this via the tus.io hooks<https://github.com/tus/tusd/blob/master/docs/hooks.md> feature by implementing the Auth code in the pre-create hook.

Thus, tus.io<http://tus.io> seems to be a good, flexible and maintainable solution for implementing file-upload functionality in Science Gateway Portals from a long term perspective.

Thanks & Regards,
Ameya Advankar
Masters in Computer Science,
Indiana University Bloomington

RE: [#Spring17-Airavata-Courses] Reliable file uploads for Science Gateway Portals

Posted by "Miller, Mark" <mm...@sdsc.edu>.
Hi Ameya,

We have thought about this at CIPRES as well.
Terri Schwartz in our group has implemented a Java version of resumable file transfer in tus.io, and I have mentioned this as a possible SciGaP/Airavata service. I would talk recommend you talk with Terri and see if there is synergy between your thinking, and what she has done.

Mark

From: Ameya Advankar [mailto:aadvanka@umail.iu.edu]
Sent: Sunday, May 07, 2017 9:23 PM
To: dev@airavata.apache.org
Subject: [#Spring17-Airavata-Courses] Reliable file uploads for Science Gateway Portals

Hi Airavata Developers,

I have been exploring and evaluating tus.io<http://tus.io/>  as a solution to the following problems encountered in Science Gateway Portals related to the file upload functionality:

1. Unreliable HTTP Connection

Since the File uploads in Science Gateway Portals are HTTP uploads, these are heavily reliant on a continuous internet connection being available on the client machine. There could be network disruptions or connectivity issues and a traditional file upload will fail in this case. As a result, the users may have to retry the uploads manually and wait for a successful upload to take place. If the files are large i.e. a few hundred Megabytes or some Gigabytes, this will cause a waste of bandwidth and time.

2. Space constraints on the Server

The file which is being uploaded usually would be staged somewhere on the Server for a certain period of time till it is picked up for further processing. The file may be there for a considerable amount of time depending on the process queuing time. In case multiple large files are uploaded at the same time by users of a Portal, the host machine may run out of space and this could have adverse affects on the performance of the Portal.

Also, there could be cases in which multiple Science Gateway Portals are hosted on a the same web server. In such a setting, if a particular Portal fills up server space with multiple large files, it may affect the performance of other Portals residing on that web server as well. In short, the file-upload functionality should not affect the Portal performance.


We could use some JavaScript libraries such as Fine Uploader<https://fineuploader.com/>, Resumable.js<http://www.resumablejs.com>  or flow.js<https://github.com/flowjs/flow.js> which provide a simple client side library for resumable file uploads in case of disruptions. Fine Uploader seems to be the best among these libraries based on the community usage and contribution on Github. For each of these client side libraries, we have to incorporate the corresponding server side code to handle resumable uploads.

However since their JavaScript implementations are unique, with each library using their own set of parameters and request headers to achieve Resumable functionality, the Server implementation which we adopt will be tightly coupled with the library we choose. This will introduce a dependency on the library.
To remove the library based dependency, we can use a client-server implementation of tus.io<http://tus.io> protocol. Using a protocol will reduce the library level dependency to a protocol level dependency.

Also since tus.io<http://tus.io> client could be implemented in any language, we could have multiple types of Gateway Portals such as Web, Desktop and native which connect to the same tus.io<http://tus.io> based server.

The Second problem of space constraint can be solved by separating the file-upload process as a micro service located on a separate host. Each Portal could have its own separate micro-service instance and this way the file-upload functionality will not hamper the Portal performance. Further, the micro-service will have to be secured and tus.io<http://tus.io> allows us to do this via the tus.io hooks<https://github.com/tus/tusd/blob/master/docs/hooks.md> feature by implementing the Auth code in the pre-create hook.

Thus, tus.io<http://tus.io> seems to be a good, flexible and maintainable solution for implementing file-upload functionality in Science Gateway Portals from a long term perspective.

Thanks & Regards,
Ameya Advankar
Masters in Computer Science,
Indiana University Bloomington