You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@oodt.apache.org by Susana Sanchez <su...@gmail.com> on 2017/02/16 07:56:15 UTC

help: OODT component for distributing data through WAN

Dear all,

I am trying to find out which of the components of Apache OODT is the most
suitable for delivering large data products to users located remotely
(users distributed on a WAN network)

I have read the CAS File Manager has the capability to archive a file to a
remote location, so it could be a candidate. However it seems, this
component was not designed for this purpose, so it is not recommended for
distributing data through a  WAN network. Is that correct?

I think the components that I am looking for are the Grid product services
(Product server/client, Profile server/client, Query server/client). Am I
right?
If not, I would like to ask you to provide some information about which
OODT components I need to distribute data products through international
networks.

I was not sure if this is the correct email list to send this kind of
question. If not, sorry about that and it would be appreciate if you could
forward it to the appropriate email address.

Thanks in advance,
Susana.

Re: help: OODT component for distributing data through WAN

Posted by Susana Sanchez Exposito <ss...@iaa.es>.

Thanks again Tom,

So, it seems that the OODT component that I am searching for is OODT
Workflow. I need to investigate about how to use this component to
implement a data delivery service, so I would like to ask you for
documentation about it.

Until now, I have installed Apache OODT (
https://cwiki.apache.org/confluence/display/OODT/RADiX+Powered+By+OODT) and
I have been playing around with the File Manager component, following this
document:

https://cwiki.apache.org/confluence/display/OODT/OODT+Filemgr+User+Guide

However, I did not find a similar document for the OODT Workflow component.
I have just seen these wiki pages:

https://cwiki.apache.org/confluence/display/OODT/Workflow2+Quick+Start+Guide
https://cwiki.apache.org/confluence/display/OODT/Workflow2+User+Guide

I don't know the difference between Workflow1 and Workflow2, so I am not
sure if these are the guides that I should follow.

I have also found this tutorial:

https://oodt.apache.org/site_docs/cas-workflow/user/basic.html

But I think I would need something more to start to work with this
component, so if you can point me to other tutorials or documentation I
would be very grateful.


Susana.



2017-02-17 16:50 GMT+01:00 Tom Barber <to...@meteorite.bi>:

> Hi Susana,
>
> Aggregating this and the off list email, you could technically connect the
> FM to the users storage but thats probably not the correct way to go about
> it.
>
> OODT is a toolbox at the end of the day so you pick the parts that enhance
> what you're already doing. One seems to certainly be the ingestion of data
> and capturing of metadata which could certainly be executed by the File
> Manager and as such OODT would then be the gateway to the ingested files.
> Off the back of that you could then implement a workflow that would trigger
> post ingestion or timed or whatever that would then figure out what to do
> with your data.
>
> For example, process ingests new data -> triggers workflow -> workflow
> looks at new data and looks up the metadata for the new files -> workflow
> then fires up GridFTP client or whatever delivery mechanism you use to
> deliver files to enduser
>
> of course in reality the workflow could be any number of steps and scale in
> many different ways, but that is one very simple OODT workflow overview.
>
> Tom
>
> On Fri, Feb 17, 2017 at 10:19 AM, Susana Sanchez Exposito <ss...@iaa.es>
> wrote:
>
> > Thanks Tom,
> >
> > From your answer I guess that I can use the OODT component File Manager
> to
> > delivery large data products (from GBs to TBs) to users located remotely
> > (i.e users that are globally distributed).
> >
> > I have still some doubts, let me add them between your lines:
> >
> > 2017-02-16 13:18 GMT+01:00 Tom Barber <ma...@apache.org>:
> >
> > > Hi Susana
> > >
> > > Welcome to the OODT list, this is indeed the correct place to ask about
> > > OODT related stuff.
> > >
> > > How you deliver data, I guess often depends on your requirements, but
> > OODT
> > > was certainly designed with that type of thing in mind.
> > >
> > > The file manager is very flexible in terms of storage and is a portal
> > > allowing for the ingestion of data products to a file store, this could
> > be
> > > a folder on a disk, nfs mount or something else, a HDFS cluster, S3 or
> > > something completely different. So the system will ingest data into the
> > >
> >
> > Do you mean that I can connect the File manager with the users' file
> > stores, so when the File Manager stores the data products, in the
> practice,
> > what it would be doing is to delivery the data products to the users?
> >
> > Given the users' file stores would be located remotely (possibly through
> > high latency networks), I would worried about the performance of this
> > option.
> >
> > In addition, with this option I would not be able to select/filter which
> > data products are delivered to each user, based on the metadata of the
> > products.
> >
> >
> >
> >
> > > file manager either through an API call, a crawling service or
> something
> > > else. During this operation metadata from the ingested files is then
> > > extracted, for example if this were an image, you could extract EXIF
> > data,
> > > GEO data etc and then store that in the catalogue alongside the
> ingested
> > > product.
> > >
> > > There is a basic UI for showing ingested products called Ops UI, but in
> > > reality for deployment as a service there would be a web interface
> > written
> > > to integrate into whatever application or portal you are already using,
> > > which would then allow users to search for products via metadata or
> keys
> > in
> > > the ingested data. From that search users could then do a range of
> things
> > > depending on what your requirements are, the simplest being clicking a
> > link
> > > to download the product. But of course it could be triggering a
> workflow,
> > > copying the file somewhere else or whatever.
> > >
> > > Behind the File Manager is also the workflow manager, so another
> scenario
> > > might be to ingest files into the file manager, which in turn triggers
> a
> > > workflow which then distributes the ingest files to people
> automatically,
> > > or performs some post processing etc.
> > >
> >
> > Ok. So, I would need to implement this workflow in such a way that 1) it
> > selects/filters which data products will be delivered to each user  and
> 2)
> > it sends the data products to the remote users, by means of efficient
> tools
> > for data movement (e.g. GridFTP)
> >
> >
> >
> > >
> > > Let us know if you have any further questions.
> > >
> >
> > Thanks again!
> >
> > Susana.
> >
> >
> >
> > >
> > > Tom
> > >
> > > On Thu, Feb 16, 2017 at 7:56 AM, Susana Sanchez <
> susanasanche@gmail.com>
> > > wrote:
> > >
> > > > Dear all,
> > > >
> > > > I am trying to find out which of the components of Apache OODT is the
> > > most
> > > > suitable for delivering large data products to users located remotely
> > > > (users distributed on a WAN network)
> > > >
> > > > I have read the CAS File Manager has the capability to archive a file
> > to
> > > a
> > > > remote location, so it could be a candidate. However it seems, this
> > > > component was not designed for this purpose, so it is not recommended
> > for
> > > > distributing data through a  WAN network. Is that correct?
> > > >
> > > > I think the components that I am looking for are the Grid product
> > > services
> > > > (Product server/client, Profile server/client, Query server/client).
> > Am I
> > > > right?
> > > > If not, I would like to ask you to provide some information about
> which
> > > > OODT components I need to distribute data products through
> > international
> > > > networks.
> > > >
> > > > I was not sure if this is the correct email list to send this kind of
> > > > question. If not, sorry about that and it would be appreciate if you
> > > could
> > > > forward it to the appropriate email address.
> > > >
> > > > Thanks in advance,
> > > > Susana.
> > > >
> > >
> >
> >
> >
> > --
> > Susana Sánchez Expósito
> >
> > Instituto de Astrofísica de Andalucía - CSIC
> > Glorieta de la Astronomía, s/n. E-18008, Granada
> > Tel:(+34) 958 121 311 / (+34) 958 230 635
> > Fax:(+34) 958 814 530
> > e-mail: sse@iaa.es
> >
>



-- 
Susana Sánchez Expósito

Instituto de Astrofísica de Andalucía - CSIC
Glorieta de la Astronomía, s/n. E-18008, Granada
Tel:(+34) 958 121 311 / (+34) 958 230 635
Fax:(+34) 958 814 530
e-mail: sse@iaa.es

Re: help: OODT component for distributing data through WAN

Posted by Tom Barber <to...@meteorite.bi>.

Hi Susana,

Aggregating this and the off list email, you could technically connect the
FM to the users storage but thats probably not the correct way to go about
it.

OODT is a toolbox at the end of the day so you pick the parts that enhance
what you're already doing. One seems to certainly be the ingestion of data
and capturing of metadata which could certainly be executed by the File
Manager and as such OODT would then be the gateway to the ingested files.
Off the back of that you could then implement a workflow that would trigger
post ingestion or timed or whatever that would then figure out what to do
with your data.

For example, process ingests new data -> triggers workflow -> workflow
looks at new data and looks up the metadata for the new files -> workflow
then fires up GridFTP client or whatever delivery mechanism you use to
deliver files to enduser

of course in reality the workflow could be any number of steps and scale in
many different ways, but that is one very simple OODT workflow overview.

Tom

On Fri, Feb 17, 2017 at 10:19 AM, Susana Sanchez Exposito <ss...@iaa.es>
wrote:

> Thanks Tom,
>
> From your answer I guess that I can use the OODT component File Manager to
> delivery large data products (from GBs to TBs) to users located remotely
> (i.e users that are globally distributed).
>
> I have still some doubts, let me add them between your lines:
>
> 2017-02-16 13:18 GMT+01:00 Tom Barber <ma...@apache.org>:
>
> > Hi Susana
> >
> > Welcome to the OODT list, this is indeed the correct place to ask about
> > OODT related stuff.
> >
> > How you deliver data, I guess often depends on your requirements, but
> OODT
> > was certainly designed with that type of thing in mind.
> >
> > The file manager is very flexible in terms of storage and is a portal
> > allowing for the ingestion of data products to a file store, this could
> be
> > a folder on a disk, nfs mount or something else, a HDFS cluster, S3 or
> > something completely different. So the system will ingest data into the
> >
>
> Do you mean that I can connect the File manager with the users' file
> stores, so when the File Manager stores the data products, in the practice,
> what it would be doing is to delivery the data products to the users?
>
> Given the users' file stores would be located remotely (possibly through
> high latency networks), I would worried about the performance of this
> option.
>
> In addition, with this option I would not be able to select/filter which
> data products are delivered to each user, based on the metadata of the
> products.
>
>
>
>
> > file manager either through an API call, a crawling service or something
> > else. During this operation metadata from the ingested files is then
> > extracted, for example if this were an image, you could extract EXIF
> data,
> > GEO data etc and then store that in the catalogue alongside the ingested
> > product.
> >
> > There is a basic UI for showing ingested products called Ops UI, but in
> > reality for deployment as a service there would be a web interface
> written
> > to integrate into whatever application or portal you are already using,
> > which would then allow users to search for products via metadata or keys
> in
> > the ingested data. From that search users could then do a range of things
> > depending on what your requirements are, the simplest being clicking a
> link
> > to download the product. But of course it could be triggering a workflow,
> > copying the file somewhere else or whatever.
> >
> > Behind the File Manager is also the workflow manager, so another scenario
> > might be to ingest files into the file manager, which in turn triggers a
> > workflow which then distributes the ingest files to people automatically,
> > or performs some post processing etc.
> >
>
> Ok. So, I would need to implement this workflow in such a way that 1) it
> selects/filters which data products will be delivered to each user  and 2)
> it sends the data products to the remote users, by means of efficient tools
> for data movement (e.g. GridFTP)
>
>
>
> >
> > Let us know if you have any further questions.
> >
>
> Thanks again!
>
> Susana.
>
>
>
> >
> > Tom
> >
> > On Thu, Feb 16, 2017 at 7:56 AM, Susana Sanchez <su...@gmail.com>
> > wrote:
> >
> > > Dear all,
> > >
> > > I am trying to find out which of the components of Apache OODT is the
> > most
> > > suitable for delivering large data products to users located remotely
> > > (users distributed on a WAN network)
> > >
> > > I have read the CAS File Manager has the capability to archive a file
> to
> > a
> > > remote location, so it could be a candidate. However it seems, this
> > > component was not designed for this purpose, so it is not recommended
> for
> > > distributing data through a  WAN network. Is that correct?
> > >
> > > I think the components that I am looking for are the Grid product
> > services
> > > (Product server/client, Profile server/client, Query server/client).
> Am I
> > > right?
> > > If not, I would like to ask you to provide some information about which
> > > OODT components I need to distribute data products through
> international
> > > networks.
> > >
> > > I was not sure if this is the correct email list to send this kind of
> > > question. If not, sorry about that and it would be appreciate if you
> > could
> > > forward it to the appropriate email address.
> > >
> > > Thanks in advance,
> > > Susana.
> > >
> >
>
>
>
> --
> Susana Sánchez Expósito
>
> Instituto de Astrofísica de Andalucía - CSIC
> Glorieta de la Astronomía, s/n. E-18008, Granada
> Tel:(+34) 958 121 311 / (+34) 958 230 635
> Fax:(+34) 958 814 530
> e-mail: sse@iaa.es
>

Re: help: OODT component for distributing data through WAN

Posted by Susana Sanchez Exposito <ss...@iaa.es>.

Thanks Tom,

From your answer I guess that I can use the OODT component File Manager to
delivery large data products (from GBs to TBs) to users located remotely
(i.e users that are globally distributed).

I have still some doubts, let me add them between your lines:

2017-02-16 13:18 GMT+01:00 Tom Barber <ma...@apache.org>:

> Hi Susana
>
> Welcome to the OODT list, this is indeed the correct place to ask about
> OODT related stuff.
>
> How you deliver data, I guess often depends on your requirements, but OODT
> was certainly designed with that type of thing in mind.
>
> The file manager is very flexible in terms of storage and is a portal
> allowing for the ingestion of data products to a file store, this could be
> a folder on a disk, nfs mount or something else, a HDFS cluster, S3 or
> something completely different. So the system will ingest data into the
>

Do you mean that I can connect the File manager with the users' file
stores, so when the File Manager stores the data products, in the practice,
what it would be doing is to delivery the data products to the users?

Given the users' file stores would be located remotely (possibly through
high latency networks), I would worried about the performance of this
option.

In addition, with this option I would not be able to select/filter which
data products are delivered to each user, based on the metadata of the
products.




> file manager either through an API call, a crawling service or something
> else. During this operation metadata from the ingested files is then
> extracted, for example if this were an image, you could extract EXIF data,
> GEO data etc and then store that in the catalogue alongside the ingested
> product.
>
> There is a basic UI for showing ingested products called Ops UI, but in
> reality for deployment as a service there would be a web interface written
> to integrate into whatever application or portal you are already using,
> which would then allow users to search for products via metadata or keys in
> the ingested data. From that search users could then do a range of things
> depending on what your requirements are, the simplest being clicking a link
> to download the product. But of course it could be triggering a workflow,
> copying the file somewhere else or whatever.
>
> Behind the File Manager is also the workflow manager, so another scenario
> might be to ingest files into the file manager, which in turn triggers a
> workflow which then distributes the ingest files to people automatically,
> or performs some post processing etc.
>

Ok. So, I would need to implement this workflow in such a way that 1) it
selects/filters which data products will be delivered to each user  and 2)
it sends the data products to the remote users, by means of efficient tools
for data movement (e.g. GridFTP)



>
> Let us know if you have any further questions.
>

Thanks again!

Susana.



>
> Tom
>
> On Thu, Feb 16, 2017 at 7:56 AM, Susana Sanchez <su...@gmail.com>
> wrote:
>
> > Dear all,
> >
> > I am trying to find out which of the components of Apache OODT is the
> most
> > suitable for delivering large data products to users located remotely
> > (users distributed on a WAN network)
> >
> > I have read the CAS File Manager has the capability to archive a file to
> a
> > remote location, so it could be a candidate. However it seems, this
> > component was not designed for this purpose, so it is not recommended for
> > distributing data through a  WAN network. Is that correct?
> >
> > I think the components that I am looking for are the Grid product
> services
> > (Product server/client, Profile server/client, Query server/client). Am I
> > right?
> > If not, I would like to ask you to provide some information about which
> > OODT components I need to distribute data products through international
> > networks.
> >
> > I was not sure if this is the correct email list to send this kind of
> > question. If not, sorry about that and it would be appreciate if you
> could
> > forward it to the appropriate email address.
> >
> > Thanks in advance,
> > Susana.
> >
>



-- 
Susana Sánchez Expósito

Instituto de Astrofísica de Andalucía - CSIC
Glorieta de la Astronomía, s/n. E-18008, Granada
Tel:(+34) 958 121 311 / (+34) 958 230 635
Fax:(+34) 958 814 530
e-mail: sse@iaa.es

Re: help: OODT component for distributing data through WAN

Posted by Tom Barber <ma...@apache.org>.

Hi Susana

Welcome to the OODT list, this is indeed the correct place to ask about
OODT related stuff.

How you deliver data, I guess often depends on your requirements, but OODT
was certainly designed with that type of thing in mind.

The file manager is very flexible in terms of storage and is a portal
allowing for the ingestion of data products to a file store, this could be
a folder on a disk, nfs mount or something else, a HDFS cluster, S3 or
something completely different. So the system will ingest data into the
file manager either through an API call, a crawling service or something
else. During this operation metadata from the ingested files is then
extracted, for example if this were an image, you could extract EXIF data,
GEO data etc and then store that in the catalogue alongside the ingested
product.

There is a basic UI for showing ingested products called Ops UI, but in
reality for deployment as a service there would be a web interface written
to integrate into whatever application or portal you are already using,
which would then allow users to search for products via metadata or keys in
the ingested data. From that search users could then do a range of things
depending on what your requirements are, the simplest being clicking a link
to download the product. But of course it could be triggering a workflow,
copying the file somewhere else or whatever.

Behind the File Manager is also the workflow manager, so another scenario
might be to ingest files into the file manager, which in turn triggers a
workflow which then distributes the ingest files to people automatically,
or performs some post processing etc.

Let us know if you have any further questions.

Tom

On Thu, Feb 16, 2017 at 7:56 AM, Susana Sanchez <su...@gmail.com>
wrote:

> Dear all,
>
> I am trying to find out which of the components of Apache OODT is the most
> suitable for delivering large data products to users located remotely
> (users distributed on a WAN network)
>
> I have read the CAS File Manager has the capability to archive a file to a
> remote location, so it could be a candidate. However it seems, this
> component was not designed for this purpose, so it is not recommended for
> distributing data through a  WAN network. Is that correct?
>
> I think the components that I am looking for are the Grid product services
> (Product server/client, Profile server/client, Query server/client). Am I
> right?
> If not, I would like to ask you to provide some information about which
> OODT components I need to distribute data products through international
> networks.
>
> I was not sure if this is the correct email list to send this kind of
> question. If not, sorry about that and it would be appreciate if you could
> forward it to the appropriate email address.
>
> Thanks in advance,
> Susana.
>