You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@oodt.apache.org by Chris Mattmann <ma...@apache.org> on 2014/03/02 08:18:49 UTC

Re: Running operations over data

Great reply Cam



-----Original Message-----
From: Cameron Goodale <si...@gmail.com>
Reply-To: "user@oodt.apache.org" <us...@oodt.apache.org>
Date: Wednesday, February 26, 2014 10:32 PM
To: "user@oodt.apache.org" <us...@oodt.apache.org>
Subject: Re: Running operations over data

>Hey Tom,
>
>
>TLDR - Crawler ships with some actions, but you can write your own
>actions, and those actions can be wired into PreIngestion or
>PostIngestion.  FileManager has MetExtractors that run before ingestion,
>they traditionally are meant to extract metadata (as
> the name implies) but you could just as easily have it run a checksum
>and store it in metadata, or convert an incoming file into PDF, then
>ingest the PDF.
>
>
>
>
>On the Snow Data System here at JPL we have a lights out operation that
>might be of interest, so I will try to explain it below.
>
>
>1.  Every hour OODT PushPull wakes up and tries to download new data from
>a Near Real Time Satellite Imagery service via FTP
>(http://lance-modis.eosdis.nasa.gov/)
>2.  Every 20 minutes OODT Crawler wakes up and crawls a local file
>staging area where PushPull downloads Satellite Images
>3.  When the crawler encounters files that have been downloaded and are
>ready for ingestion then things get interesting.  During the crawl
>several pre-conditions need to be met (the file cannot already be in the
>catalog - guarding against duplicates, the
> file has to be of the correct mime-type, etc..)
>4.  If preconditions pass then Crawler will ingest the file(s) into OODT
>FileManager, but things don't stop here.
>5.  Crawler has a post-ingest success hook that we leverage and we use
>the "TriggerPostIngestWorkflow" action which automatically submits an
>event to workflow
>6.  OODT Workflow Manager receives the event (in this example it would be
>"MOD09GANRTIngest") and it boils that down into tasks that get run.
>7.  Workflow Manager then sends these tasks to the OODT Resource Manager
>who farms the jobs off to Batchstubs that are running across 4 different
>machines.
>8.  When the jobs complete, crawler will ingest the final outputs back
>into the FileManager.
>
>
>Hope that helps.
>
>
>Best Regards,
>
>
>
>
>Cameron
>
>
>
>On Tue, Feb 25, 2014 at 1:47 PM, Tom Barber
><to...@meteorite.bi> wrote:
>
>Hello folks,
>
>Preparing for this talk, so I figure I should probably work out how OODT
>works..... ;)
>
>Anyway I have some ideas as how to integrate some more non science like
>tools into OODT but I'm still figuring out some of the components.
>Namely, workflows.
>
>
>If for example, in OODT world I wanted to ingest a bunch of data and
>perform some operation on them, does this happen during the ingest phase,
>or post ingest?
>
>Normally you guys would write some crazy scientific stuff I guess to
>analyse the data you're ingesting and then dump it in some different
>format into the catalog, does that sound about right?
>
>Thanks
>
>Tom
>-- 
>Tom Barber | Technical Director
>
>meteorite bi
>T: 
>+44 20 8133 3730 <tel:%2B44%2020%208133%203730>
>W: www.meteorite.bi <http://www.meteorite.bi> |
>Skype: meteorite.consulting
>A: Surrey Technology Centre, Surrey Research Park, Guildford, GU2 7YG, UK
>
>
>
>
>
>
>
>
>-- 
>
>Sent from a Tin Can attached to a String
>
>



Re: Running operations over data

Posted by Cameron Goodale <si...@gmail.com>.
Hey Tom,

On the Snow Project we use WebDAV because we have users that want access to
data and they just pick and chose what they want.  There are other OODT
based solutions, but I am not familiar with them.  Perhaps someone else can
talk about OODT based solutions.

Cheers,


Cameron


On Tue, Mar 4, 2014 at 11:52 AM, Tom Barber <to...@meteorite.bi> wrote:

>  Okay,
>
> I have a follow up question :)
>
> So you run all the steps set out below.
>
> What then? How do people get access to the data?
>
> I've seen a bunch of screenshots of different frontends that run over the
> filemanager and allow people to export the files that have been ingested.
> Is that the "normal" way of giving people access to the data or have users
> come up with more novel ways of getting hands on the data?
>
> Cheers
>
> Tom
>
> On 02/03/14 07:18, Chris Mattmann wrote:
>
> Great reply Cam
>
>
>
> -----Original Message-----
> From: Cameron Goodale <si...@gmail.com> <si...@gmail.com>
> Reply-To: "user@oodt.apache.org" <us...@oodt.apache.org> <us...@oodt.apache.org> <us...@oodt.apache.org>
> Date: Wednesday, February 26, 2014 10:32 PM
> To: "user@oodt.apache.org" <us...@oodt.apache.org> <us...@oodt.apache.org> <us...@oodt.apache.org>
> Subject: Re: Running operations over data
>
>
>  Hey Tom,
>
>
> TLDR - Crawler ships with some actions, but you can write your own
> actions, and those actions can be wired into PreIngestion or
> PostIngestion.  FileManager has MetExtractors that run before ingestion,
> they traditionally are meant to extract metadata (as
> the name implies) but you could just as easily have it run a checksum
> and store it in metadata, or convert an incoming file into PDF, then
> ingest the PDF.
>
>
>
>
> On the Snow Data System here at JPL we have a lights out operation that
> might be of interest, so I will try to explain it below.
>
>
> 1.  Every hour OODT PushPull wakes up and tries to download new data from
> a Near Real Time Satellite Imagery service via FTP
> (http://lance-modis.eosdis.nasa.gov/)
> 2.  Every 20 minutes OODT Crawler wakes up and crawls a local file
> staging area where PushPull downloads Satellite Images
> 3.  When the crawler encounters files that have been downloaded and are
> ready for ingestion then things get interesting.  During the crawl
> several pre-conditions need to be met (the file cannot already be in the
> catalog - guarding against duplicates, the
> file has to be of the correct mime-type, etc..)
> 4.  If preconditions pass then Crawler will ingest the file(s) into OODT
> FileManager, but things don't stop here.
> 5.  Crawler has a post-ingest success hook that we leverage and we use
> the "TriggerPostIngestWorkflow" action which automatically submits an
> event to workflow
> 6.  OODT Workflow Manager receives the event (in this example it would be
> "MOD09GANRTIngest") and it boils that down into tasks that get run.
> 7.  Workflow Manager then sends these tasks to the OODT Resource Manager
> who farms the jobs off to Batchstubs that are running across 4 different
> machines.
> 8.  When the jobs complete, crawler will ingest the final outputs back
> into the FileManager.
>
>
> Hope that helps.
>
>
> Best Regards,
>
>
>
>
> Cameron
>
>
>
> On Tue, Feb 25, 2014 at 1:47 PM, Tom Barber<to...@meteorite.bi> <to...@meteorite.bi> wrote:
>
> Hello folks,
>
> Preparing for this talk, so I figure I should probably work out how OODT
> works..... ;)
>
> Anyway I have some ideas as how to integrate some more non science like
> tools into OODT but I'm still figuring out some of the components.
> Namely, workflows.
>
>
> If for example, in OODT world I wanted to ingest a bunch of data and
> perform some operation on them, does this happen during the ingest phase,
> or post ingest?
>
> Normally you guys would write some crazy scientific stuff I guess to
> analyse the data you're ingesting and then dump it in some different
> format into the catalog, does that sound about right?
>
> Thanks
>
> Tom
> --
> Tom Barber | Technical Director
>
> meteorite bi
> T: +44 20 8133 3730 <tel:%2B44%2020%208133%203730>
> W: www.meteorite.bi <http://www.meteorite.bi> <http://www.meteorite.bi> |
> Skype: meteorite.consulting
> A: Surrey Technology Centre, Surrey Research Park, Guildford, GU2 7YG, UK
>
>
>
>
>
>
>
>
> --
>
> Sent from a Tin Can attached to a String
>
>
>
>
>
> --
> *Tom Barber* | Technical Director
>
> meteorite bi
> *T:* +44 20 8133 3730
> *W:* www.meteorite.bi | *Skype:* meteorite.consulting
> *A:* Surrey Technology Centre, Surrey Research Park, Guildford, GU2 7YG,
> UK
>



-- 

Sent from a Tin Can attached to a String

Re: Running operations over data

Posted by Cameron Goodale <si...@gmail.com>.
Hey Tom,

On the Snow Project we use WebDAV because we have users that want access to
data and they just pick and chose what they want.  There are other OODT
based solutions, but I am not familiar with them.  Perhaps someone else can
talk about OODT based solutions.

Cheers,


Cameron


On Tue, Mar 4, 2014 at 11:52 AM, Tom Barber <to...@meteorite.bi> wrote:

>  Okay,
>
> I have a follow up question :)
>
> So you run all the steps set out below.
>
> What then? How do people get access to the data?
>
> I've seen a bunch of screenshots of different frontends that run over the
> filemanager and allow people to export the files that have been ingested.
> Is that the "normal" way of giving people access to the data or have users
> come up with more novel ways of getting hands on the data?
>
> Cheers
>
> Tom
>
> On 02/03/14 07:18, Chris Mattmann wrote:
>
> Great reply Cam
>
>
>
> -----Original Message-----
> From: Cameron Goodale <si...@gmail.com> <si...@gmail.com>
> Reply-To: "user@oodt.apache.org" <us...@oodt.apache.org> <us...@oodt.apache.org> <us...@oodt.apache.org>
> Date: Wednesday, February 26, 2014 10:32 PM
> To: "user@oodt.apache.org" <us...@oodt.apache.org> <us...@oodt.apache.org> <us...@oodt.apache.org>
> Subject: Re: Running operations over data
>
>
>  Hey Tom,
>
>
> TLDR - Crawler ships with some actions, but you can write your own
> actions, and those actions can be wired into PreIngestion or
> PostIngestion.  FileManager has MetExtractors that run before ingestion,
> they traditionally are meant to extract metadata (as
> the name implies) but you could just as easily have it run a checksum
> and store it in metadata, or convert an incoming file into PDF, then
> ingest the PDF.
>
>
>
>
> On the Snow Data System here at JPL we have a lights out operation that
> might be of interest, so I will try to explain it below.
>
>
> 1.  Every hour OODT PushPull wakes up and tries to download new data from
> a Near Real Time Satellite Imagery service via FTP
> (http://lance-modis.eosdis.nasa.gov/)
> 2.  Every 20 minutes OODT Crawler wakes up and crawls a local file
> staging area where PushPull downloads Satellite Images
> 3.  When the crawler encounters files that have been downloaded and are
> ready for ingestion then things get interesting.  During the crawl
> several pre-conditions need to be met (the file cannot already be in the
> catalog - guarding against duplicates, the
> file has to be of the correct mime-type, etc..)
> 4.  If preconditions pass then Crawler will ingest the file(s) into OODT
> FileManager, but things don't stop here.
> 5.  Crawler has a post-ingest success hook that we leverage and we use
> the "TriggerPostIngestWorkflow" action which automatically submits an
> event to workflow
> 6.  OODT Workflow Manager receives the event (in this example it would be
> "MOD09GANRTIngest") and it boils that down into tasks that get run.
> 7.  Workflow Manager then sends these tasks to the OODT Resource Manager
> who farms the jobs off to Batchstubs that are running across 4 different
> machines.
> 8.  When the jobs complete, crawler will ingest the final outputs back
> into the FileManager.
>
>
> Hope that helps.
>
>
> Best Regards,
>
>
>
>
> Cameron
>
>
>
> On Tue, Feb 25, 2014 at 1:47 PM, Tom Barber<to...@meteorite.bi> <to...@meteorite.bi> wrote:
>
> Hello folks,
>
> Preparing for this talk, so I figure I should probably work out how OODT
> works..... ;)
>
> Anyway I have some ideas as how to integrate some more non science like
> tools into OODT but I'm still figuring out some of the components.
> Namely, workflows.
>
>
> If for example, in OODT world I wanted to ingest a bunch of data and
> perform some operation on them, does this happen during the ingest phase,
> or post ingest?
>
> Normally you guys would write some crazy scientific stuff I guess to
> analyse the data you're ingesting and then dump it in some different
> format into the catalog, does that sound about right?
>
> Thanks
>
> Tom
> --
> Tom Barber | Technical Director
>
> meteorite bi
> T: +44 20 8133 3730 <tel:%2B44%2020%208133%203730>
> W: www.meteorite.bi <http://www.meteorite.bi> <http://www.meteorite.bi> |
> Skype: meteorite.consulting
> A: Surrey Technology Centre, Surrey Research Park, Guildford, GU2 7YG, UK
>
>
>
>
>
>
>
>
> --
>
> Sent from a Tin Can attached to a String
>
>
>
>
>
> --
> *Tom Barber* | Technical Director
>
> meteorite bi
> *T:* +44 20 8133 3730
> *W:* www.meteorite.bi | *Skype:* meteorite.consulting
> *A:* Surrey Technology Centre, Surrey Research Park, Guildford, GU2 7YG,
> UK
>



-- 

Sent from a Tin Can attached to a String

Re: Running operations over data

Posted by Tom Barber <to...@meteorite.bi>.
Okay,

I have a follow up question :)

So you run all the steps set out below.

What then? How do people get access to the data?

I've seen a bunch of screenshots of different frontends that run over 
the filemanager and allow people to export the files that have been 
ingested. Is that the "normal" way of giving people access to the data 
or have users come up with more novel ways of getting hands on the data?

Cheers

Tom

On 02/03/14 07:18, Chris Mattmann wrote:
> Great reply Cam
>
>
>
> -----Original Message-----
> From: Cameron Goodale <si...@gmail.com>
> Reply-To: "user@oodt.apache.org" <us...@oodt.apache.org>
> Date: Wednesday, February 26, 2014 10:32 PM
> To: "user@oodt.apache.org" <us...@oodt.apache.org>
> Subject: Re: Running operations over data
>
>> Hey Tom,
>>
>>
>> TLDR - Crawler ships with some actions, but you can write your own
>> actions, and those actions can be wired into PreIngestion or
>> PostIngestion.  FileManager has MetExtractors that run before ingestion,
>> they traditionally are meant to extract metadata (as
>> the name implies) but you could just as easily have it run a checksum
>> and store it in metadata, or convert an incoming file into PDF, then
>> ingest the PDF.
>>
>>
>>
>>
>> On the Snow Data System here at JPL we have a lights out operation that
>> might be of interest, so I will try to explain it below.
>>
>>
>> 1.  Every hour OODT PushPull wakes up and tries to download new data from
>> a Near Real Time Satellite Imagery service via FTP
>> (http://lance-modis.eosdis.nasa.gov/)
>> 2.  Every 20 minutes OODT Crawler wakes up and crawls a local file
>> staging area where PushPull downloads Satellite Images
>> 3.  When the crawler encounters files that have been downloaded and are
>> ready for ingestion then things get interesting.  During the crawl
>> several pre-conditions need to be met (the file cannot already be in the
>> catalog - guarding against duplicates, the
>> file has to be of the correct mime-type, etc..)
>> 4.  If preconditions pass then Crawler will ingest the file(s) into OODT
>> FileManager, but things don't stop here.
>> 5.  Crawler has a post-ingest success hook that we leverage and we use
>> the "TriggerPostIngestWorkflow" action which automatically submits an
>> event to workflow
>> 6.  OODT Workflow Manager receives the event (in this example it would be
>> "MOD09GANRTIngest") and it boils that down into tasks that get run.
>> 7.  Workflow Manager then sends these tasks to the OODT Resource Manager
>> who farms the jobs off to Batchstubs that are running across 4 different
>> machines.
>> 8.  When the jobs complete, crawler will ingest the final outputs back
>> into the FileManager.
>>
>>
>> Hope that helps.
>>
>>
>> Best Regards,
>>
>>
>>
>>
>> Cameron
>>
>>
>>
>> On Tue, Feb 25, 2014 at 1:47 PM, Tom Barber
>> <to...@meteorite.bi> wrote:
>>
>> Hello folks,
>>
>> Preparing for this talk, so I figure I should probably work out how OODT
>> works..... ;)
>>
>> Anyway I have some ideas as how to integrate some more non science like
>> tools into OODT but I'm still figuring out some of the components.
>> Namely, workflows.
>>
>>
>> If for example, in OODT world I wanted to ingest a bunch of data and
>> perform some operation on them, does this happen during the ingest phase,
>> or post ingest?
>>
>> Normally you guys would write some crazy scientific stuff I guess to
>> analyse the data you're ingesting and then dump it in some different
>> format into the catalog, does that sound about right?
>>
>> Thanks
>>
>> Tom
>> -- 
>> Tom Barber | Technical Director
>>
>> meteorite bi
>> T:
>> +44 20 8133 3730 <tel:%2B44%2020%208133%203730>
>> W: www.meteorite.bi <http://www.meteorite.bi> |
>> Skype: meteorite.consulting
>> A: Surrey Technology Centre, Surrey Research Park, Guildford, GU2 7YG, UK
>>
>>
>>
>>
>>
>>
>>
>>
>> -- 
>>
>> Sent from a Tin Can attached to a String
>>
>>
>


-- 
*Tom Barber* | Technical Director

meteorite bi
*T:* +44 20 8133 3730
*W:* www.meteorite.bi | *Skype:* meteorite.consulting
*A:* Surrey Technology Centre, Surrey Research Park, Guildford, GU2 7YG, UK

Re: Running operations over data

Posted by Tom Barber <to...@meteorite.bi>.
Aye yeah sorry for not responding, been away. Very useful thanks a lot!

Tom

On 02/03/14 07:18, Chris Mattmann wrote:
> Great reply Cam
>
>
>
> -----Original Message-----
> From: Cameron Goodale <si...@gmail.com>
> Reply-To: "user@oodt.apache.org" <us...@oodt.apache.org>
> Date: Wednesday, February 26, 2014 10:32 PM
> To: "user@oodt.apache.org" <us...@oodt.apache.org>
> Subject: Re: Running operations over data
>
>> Hey Tom,
>>
>>
>> TLDR - Crawler ships with some actions, but you can write your own
>> actions, and those actions can be wired into PreIngestion or
>> PostIngestion.  FileManager has MetExtractors that run before ingestion,
>> they traditionally are meant to extract metadata (as
>> the name implies) but you could just as easily have it run a checksum
>> and store it in metadata, or convert an incoming file into PDF, then
>> ingest the PDF.
>>
>>
>>
>>
>> On the Snow Data System here at JPL we have a lights out operation that
>> might be of interest, so I will try to explain it below.
>>
>>
>> 1.  Every hour OODT PushPull wakes up and tries to download new data from
>> a Near Real Time Satellite Imagery service via FTP
>> (http://lance-modis.eosdis.nasa.gov/)
>> 2.  Every 20 minutes OODT Crawler wakes up and crawls a local file
>> staging area where PushPull downloads Satellite Images
>> 3.  When the crawler encounters files that have been downloaded and are
>> ready for ingestion then things get interesting.  During the crawl
>> several pre-conditions need to be met (the file cannot already be in the
>> catalog - guarding against duplicates, the
>> file has to be of the correct mime-type, etc..)
>> 4.  If preconditions pass then Crawler will ingest the file(s) into OODT
>> FileManager, but things don't stop here.
>> 5.  Crawler has a post-ingest success hook that we leverage and we use
>> the "TriggerPostIngestWorkflow" action which automatically submits an
>> event to workflow
>> 6.  OODT Workflow Manager receives the event (in this example it would be
>> "MOD09GANRTIngest") and it boils that down into tasks that get run.
>> 7.  Workflow Manager then sends these tasks to the OODT Resource Manager
>> who farms the jobs off to Batchstubs that are running across 4 different
>> machines.
>> 8.  When the jobs complete, crawler will ingest the final outputs back
>> into the FileManager.
>>
>>
>> Hope that helps.
>>
>>
>> Best Regards,
>>
>>
>>
>>
>> Cameron
>>
>>
>>
>> On Tue, Feb 25, 2014 at 1:47 PM, Tom Barber
>> <to...@meteorite.bi> wrote:
>>
>> Hello folks,
>>
>> Preparing for this talk, so I figure I should probably work out how OODT
>> works..... ;)
>>
>> Anyway I have some ideas as how to integrate some more non science like
>> tools into OODT but I'm still figuring out some of the components.
>> Namely, workflows.
>>
>>
>> If for example, in OODT world I wanted to ingest a bunch of data and
>> perform some operation on them, does this happen during the ingest phase,
>> or post ingest?
>>
>> Normally you guys would write some crazy scientific stuff I guess to
>> analyse the data you're ingesting and then dump it in some different
>> format into the catalog, does that sound about right?
>>
>> Thanks
>>
>> Tom
>> -- 
>> Tom Barber | Technical Director
>>
>> meteorite bi
>> T:
>> +44 20 8133 3730 <tel:%2B44%2020%208133%203730>
>> W: www.meteorite.bi <http://www.meteorite.bi> |
>> Skype: meteorite.consulting
>> A: Surrey Technology Centre, Surrey Research Park, Guildford, GU2 7YG, UK
>>
>>
>>
>>
>>
>>
>>
>>
>> -- 
>>
>> Sent from a Tin Can attached to a String
>>
>>
>


-- 
*Tom Barber* | Technical Director

meteorite bi
*T:* +44 20 8133 3730
*W:* www.meteorite.bi | *Skype:* meteorite.consulting
*A:* Surrey Technology Centre, Surrey Research Park, Guildford, GU2 7YG, UK

Re: Running operations over data

Posted by Tom Barber <to...@meteorite.bi>.
Aye yeah sorry for not responding, been away. Very useful thanks a lot!

Tom

On 02/03/14 07:18, Chris Mattmann wrote:
> Great reply Cam
>
>
>
> -----Original Message-----
> From: Cameron Goodale <si...@gmail.com>
> Reply-To: "user@oodt.apache.org" <us...@oodt.apache.org>
> Date: Wednesday, February 26, 2014 10:32 PM
> To: "user@oodt.apache.org" <us...@oodt.apache.org>
> Subject: Re: Running operations over data
>
>> Hey Tom,
>>
>>
>> TLDR - Crawler ships with some actions, but you can write your own
>> actions, and those actions can be wired into PreIngestion or
>> PostIngestion.  FileManager has MetExtractors that run before ingestion,
>> they traditionally are meant to extract metadata (as
>> the name implies) but you could just as easily have it run a checksum
>> and store it in metadata, or convert an incoming file into PDF, then
>> ingest the PDF.
>>
>>
>>
>>
>> On the Snow Data System here at JPL we have a lights out operation that
>> might be of interest, so I will try to explain it below.
>>
>>
>> 1.  Every hour OODT PushPull wakes up and tries to download new data from
>> a Near Real Time Satellite Imagery service via FTP
>> (http://lance-modis.eosdis.nasa.gov/)
>> 2.  Every 20 minutes OODT Crawler wakes up and crawls a local file
>> staging area where PushPull downloads Satellite Images
>> 3.  When the crawler encounters files that have been downloaded and are
>> ready for ingestion then things get interesting.  During the crawl
>> several pre-conditions need to be met (the file cannot already be in the
>> catalog - guarding against duplicates, the
>> file has to be of the correct mime-type, etc..)
>> 4.  If preconditions pass then Crawler will ingest the file(s) into OODT
>> FileManager, but things don't stop here.
>> 5.  Crawler has a post-ingest success hook that we leverage and we use
>> the "TriggerPostIngestWorkflow" action which automatically submits an
>> event to workflow
>> 6.  OODT Workflow Manager receives the event (in this example it would be
>> "MOD09GANRTIngest") and it boils that down into tasks that get run.
>> 7.  Workflow Manager then sends these tasks to the OODT Resource Manager
>> who farms the jobs off to Batchstubs that are running across 4 different
>> machines.
>> 8.  When the jobs complete, crawler will ingest the final outputs back
>> into the FileManager.
>>
>>
>> Hope that helps.
>>
>>
>> Best Regards,
>>
>>
>>
>>
>> Cameron
>>
>>
>>
>> On Tue, Feb 25, 2014 at 1:47 PM, Tom Barber
>> <to...@meteorite.bi> wrote:
>>
>> Hello folks,
>>
>> Preparing for this talk, so I figure I should probably work out how OODT
>> works..... ;)
>>
>> Anyway I have some ideas as how to integrate some more non science like
>> tools into OODT but I'm still figuring out some of the components.
>> Namely, workflows.
>>
>>
>> If for example, in OODT world I wanted to ingest a bunch of data and
>> perform some operation on them, does this happen during the ingest phase,
>> or post ingest?
>>
>> Normally you guys would write some crazy scientific stuff I guess to
>> analyse the data you're ingesting and then dump it in some different
>> format into the catalog, does that sound about right?
>>
>> Thanks
>>
>> Tom
>> -- 
>> Tom Barber | Technical Director
>>
>> meteorite bi
>> T:
>> +44 20 8133 3730 <tel:%2B44%2020%208133%203730>
>> W: www.meteorite.bi <http://www.meteorite.bi> |
>> Skype: meteorite.consulting
>> A: Surrey Technology Centre, Surrey Research Park, Guildford, GU2 7YG, UK
>>
>>
>>
>>
>>
>>
>>
>>
>> -- 
>>
>> Sent from a Tin Can attached to a String
>>
>>
>


-- 
*Tom Barber* | Technical Director

meteorite bi
*T:* +44 20 8133 3730
*W:* www.meteorite.bi | *Skype:* meteorite.consulting
*A:* Surrey Technology Centre, Surrey Research Park, Guildford, GU2 7YG, UK

Re: Running operations over data

Posted by Tom Barber <to...@meteorite.bi>.
Okay,

I have a follow up question :)

So you run all the steps set out below.

What then? How do people get access to the data?

I've seen a bunch of screenshots of different frontends that run over 
the filemanager and allow people to export the files that have been 
ingested. Is that the "normal" way of giving people access to the data 
or have users come up with more novel ways of getting hands on the data?

Cheers

Tom

On 02/03/14 07:18, Chris Mattmann wrote:
> Great reply Cam
>
>
>
> -----Original Message-----
> From: Cameron Goodale <si...@gmail.com>
> Reply-To: "user@oodt.apache.org" <us...@oodt.apache.org>
> Date: Wednesday, February 26, 2014 10:32 PM
> To: "user@oodt.apache.org" <us...@oodt.apache.org>
> Subject: Re: Running operations over data
>
>> Hey Tom,
>>
>>
>> TLDR - Crawler ships with some actions, but you can write your own
>> actions, and those actions can be wired into PreIngestion or
>> PostIngestion.  FileManager has MetExtractors that run before ingestion,
>> they traditionally are meant to extract metadata (as
>> the name implies) but you could just as easily have it run a checksum
>> and store it in metadata, or convert an incoming file into PDF, then
>> ingest the PDF.
>>
>>
>>
>>
>> On the Snow Data System here at JPL we have a lights out operation that
>> might be of interest, so I will try to explain it below.
>>
>>
>> 1.  Every hour OODT PushPull wakes up and tries to download new data from
>> a Near Real Time Satellite Imagery service via FTP
>> (http://lance-modis.eosdis.nasa.gov/)
>> 2.  Every 20 minutes OODT Crawler wakes up and crawls a local file
>> staging area where PushPull downloads Satellite Images
>> 3.  When the crawler encounters files that have been downloaded and are
>> ready for ingestion then things get interesting.  During the crawl
>> several pre-conditions need to be met (the file cannot already be in the
>> catalog - guarding against duplicates, the
>> file has to be of the correct mime-type, etc..)
>> 4.  If preconditions pass then Crawler will ingest the file(s) into OODT
>> FileManager, but things don't stop here.
>> 5.  Crawler has a post-ingest success hook that we leverage and we use
>> the "TriggerPostIngestWorkflow" action which automatically submits an
>> event to workflow
>> 6.  OODT Workflow Manager receives the event (in this example it would be
>> "MOD09GANRTIngest") and it boils that down into tasks that get run.
>> 7.  Workflow Manager then sends these tasks to the OODT Resource Manager
>> who farms the jobs off to Batchstubs that are running across 4 different
>> machines.
>> 8.  When the jobs complete, crawler will ingest the final outputs back
>> into the FileManager.
>>
>>
>> Hope that helps.
>>
>>
>> Best Regards,
>>
>>
>>
>>
>> Cameron
>>
>>
>>
>> On Tue, Feb 25, 2014 at 1:47 PM, Tom Barber
>> <to...@meteorite.bi> wrote:
>>
>> Hello folks,
>>
>> Preparing for this talk, so I figure I should probably work out how OODT
>> works..... ;)
>>
>> Anyway I have some ideas as how to integrate some more non science like
>> tools into OODT but I'm still figuring out some of the components.
>> Namely, workflows.
>>
>>
>> If for example, in OODT world I wanted to ingest a bunch of data and
>> perform some operation on them, does this happen during the ingest phase,
>> or post ingest?
>>
>> Normally you guys would write some crazy scientific stuff I guess to
>> analyse the data you're ingesting and then dump it in some different
>> format into the catalog, does that sound about right?
>>
>> Thanks
>>
>> Tom
>> -- 
>> Tom Barber | Technical Director
>>
>> meteorite bi
>> T:
>> +44 20 8133 3730 <tel:%2B44%2020%208133%203730>
>> W: www.meteorite.bi <http://www.meteorite.bi> |
>> Skype: meteorite.consulting
>> A: Surrey Technology Centre, Surrey Research Park, Guildford, GU2 7YG, UK
>>
>>
>>
>>
>>
>>
>>
>>
>> -- 
>>
>> Sent from a Tin Can attached to a String
>>
>>
>


-- 
*Tom Barber* | Technical Director

meteorite bi
*T:* +44 20 8133 3730
*W:* www.meteorite.bi | *Skype:* meteorite.consulting
*A:* Surrey Technology Centre, Surrey Research Park, Guildford, GU2 7YG, UK