You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@oodt.apache.org by "Verma, Rishi (388J)" <Ri...@jpl.nasa.gov> on 2012/11/09 03:11:03 UTC

PushPull framework and custom met extraction

Hi All -

I'm wondering if anyone has experience with, or knows the details of how to use custom MetExtractors on products that are remotely downloaded via PushPull.

By default, PushPull performs some basic met-extraction and creates a ".tmp" file associated with downloaded products, but I'm wondering whether this met generation step is customizable.

I've looked through the configuration files (e.g. [1], [2]) as well as the code for PushPull, but I can't seem to locate configuration parameters to support the invocation of custom met extractors on downloaded data.

If any of you have experience with this, or can point me on where to look, I'd really appreciate it.

Thanks!
Rishi

--
[1] http://svn.apache.org/repos/asf/oodt/trunk/pushpull/src/main/resources/push_pull_framework.properties
[2] http://svn.apache.org/repos/asf/oodt/trunk/pushpull/src/main/resources/examples/

Re: PushPull framework and custom met extraction

Posted by Sheryl John <sh...@gmail.com>.
Hey Rishi,

The ".tmp" file looks like it's created here [1]. I don't see any custom
met extractors used here since it adds metadata and writes to the temp met
file, but I maybe wrong.
Maybe someone who used PushPull can confirm this.


[1]
http://svn.apache.org/repos/asf/oodt/trunk/pushpull/src/main/java/org/apache/oodt/cas/pushpull/retrievalsystem/FileRetrievalSystem.java


On Thu, Nov 8, 2012 at 6:11 PM, Verma, Rishi (388J) <
Rishi.Verma@jpl.nasa.gov> wrote:

>  Hi All -
>
>  I'm wondering if anyone has experience with, or knows the details of how
> to use custom MetExtractors on products that are remotely downloaded via
> PushPull.
>
>  By default, PushPull performs some basic met-extraction and creates a
> ".tmp" file associated with downloaded products, but I'm wondering whether
> this met generation step is customizable.
>
>  I've looked through the configuration files (e.g. [1], [2]) as well as
> the code for PushPull, but I can't seem to locate configuration parameters
> to support the invocation of custom met extractors on downloaded data.
>
>  If any of you have experience with this, or can point me on where to
> look, I'd really appreciate it.
>
>  Thanks!
> Rishi
>
>  --
> [1]
> http://svn.apache.org/repos/asf/oodt/trunk/pushpull/src/main/resources/push_pull_framework.properties
> [2]
> http://svn.apache.org/repos/asf/oodt/trunk/pushpull/src/main/resources/examples
> /
>



-- 
-Sheryl

Re: PushPull framework and custom met extraction

Posted by Sheryl John <sh...@gmail.com>.
Hey Rishi,

The ".tmp" file looks like it's created here [1]. I don't see any custom
met extractors used here since it adds metadata and writes to the temp met
file, but I maybe wrong.
Maybe someone who used PushPull can confirm this.


[1]
http://svn.apache.org/repos/asf/oodt/trunk/pushpull/src/main/java/org/apache/oodt/cas/pushpull/retrievalsystem/FileRetrievalSystem.java


On Thu, Nov 8, 2012 at 6:11 PM, Verma, Rishi (388J) <
Rishi.Verma@jpl.nasa.gov> wrote:

>  Hi All -
>
>  I'm wondering if anyone has experience with, or knows the details of how
> to use custom MetExtractors on products that are remotely downloaded via
> PushPull.
>
>  By default, PushPull performs some basic met-extraction and creates a
> ".tmp" file associated with downloaded products, but I'm wondering whether
> this met generation step is customizable.
>
>  I've looked through the configuration files (e.g. [1], [2]) as well as
> the code for PushPull, but I can't seem to locate configuration parameters
> to support the invocation of custom met extractors on downloaded data.
>
>  If any of you have experience with this, or can point me on where to
> look, I'd really appreciate it.
>
>  Thanks!
> Rishi
>
>  --
> [1]
> http://svn.apache.org/repos/asf/oodt/trunk/pushpull/src/main/resources/push_pull_framework.properties
> [2]
> http://svn.apache.org/repos/asf/oodt/trunk/pushpull/src/main/resources/examples
> /
>



-- 
-Sheryl

Re: PushPull framework and custom met extraction

Posted by "Mattmann, Chris A (388J)" <ch...@jpl.nasa.gov>.
Hey Rishi,

There was no hookup directly between Push Pull and crawler for the reason of keeping them to be
independent components -- so the best way to do it would be to maintain crawler separately as a daemon,
and then to use File Guard Met Extraction Pre Conditions to prevent ingesting any files that don't match a particular
pattern and/or that haven't been fully downloaded by Push Pull.

Cheers,
Chris

On Nov 10, 2012, at 8:06 AM, Verma, Rishi (388J) wrote:

> Hey Brian, Shreyl,
> 
> Thanks for your input and clarification on this.
> 
> Brian - the delegation of duties you described makes sense. Does cas-puspull have any way to invoke a local crawl process following completion of downloads? I know it has a filemgr hookup, but I wonder about whether a crawl process can be invoked following the completion of all file downloads via pushpull. The alternative way of doing this could, of course, be to schedule the crawler deamon to run well after the pushpull deamon finishes its work.
> 
> Thanks to both of you for your help!
> rishi
> 
> On Nov 9, 2012, at 10:08 AM, Brian Foster wrote:
> 
>> 
>> Hey Rishi,
>> 
>> You will need to use both cas-pushpull and cas-crawler to accomplish this...
>> 
>> cas-pushpull: Used to for downloading files from remote sites to you local systems... the .tmp files contain cas-pushpull's known metadata and you can configure which of the known metadata gets written out or if a .tmp file gets created at all... however you can add custom metadata fields to it.
>> 
>> cas-crawler: Allows for metadata extraction (custom metadata) from files on your local system... and then allows you to ingest them into the filemgr (optionally can be turned off)
>> 
>> HTH
>> -brian
>> 
>> On Nov 08, 2012, at 06:11 PM, "Verma, Rishi (388J)" <Ri...@jpl.nasa.gov> wrote:
>> 
>>> Hi All -
>>> 
>>> I'm wondering if anyone has experience with, or knows the details of how to use custom MetExtractors on products that are remotely downloaded via PushPull. 
>>> 
>>> By default, PushPull performs some basic met-extraction and creates a ".tmp" file associated with downloaded products, but I'm wondering whether this met generation step is customizable.
>>> 
>>> I've looked through the configuration files (e.g. [1], [2]) as well as the code for PushPull, but I can't seem to locate configuration parameters to support the invocation of custom met extractors on downloaded data.
>>> 
>>> If any of you have experience with this, or can point me on where to look, I'd really appreciate it.
>>> 
>>> Thanks! 
>>> Rishi 
>>> 
>>> --
>>> [1] http://svn.apache.org/repos/asf/oodt/trunk/pushpull/src/main/resources/push_pull_framework.properties
>>>  
>>> [2] http://svn.apache.org/repos/asf/oodt/trunk/pushpull/src/main/resources/examples/
> 


Re: PushPull framework and custom met extraction

Posted by "Mattmann, Chris A (388J)" <ch...@jpl.nasa.gov>.
Hey Rishi,

There was no hookup directly between Push Pull and crawler for the reason of keeping them to be
independent components -- so the best way to do it would be to maintain crawler separately as a daemon,
and then to use File Guard Met Extraction Pre Conditions to prevent ingesting any files that don't match a particular
pattern and/or that haven't been fully downloaded by Push Pull.

Cheers,
Chris

On Nov 10, 2012, at 8:06 AM, Verma, Rishi (388J) wrote:

> Hey Brian, Shreyl,
> 
> Thanks for your input and clarification on this.
> 
> Brian - the delegation of duties you described makes sense. Does cas-puspull have any way to invoke a local crawl process following completion of downloads? I know it has a filemgr hookup, but I wonder about whether a crawl process can be invoked following the completion of all file downloads via pushpull. The alternative way of doing this could, of course, be to schedule the crawler deamon to run well after the pushpull deamon finishes its work.
> 
> Thanks to both of you for your help!
> rishi
> 
> On Nov 9, 2012, at 10:08 AM, Brian Foster wrote:
> 
>> 
>> Hey Rishi,
>> 
>> You will need to use both cas-pushpull and cas-crawler to accomplish this...
>> 
>> cas-pushpull: Used to for downloading files from remote sites to you local systems... the .tmp files contain cas-pushpull's known metadata and you can configure which of the known metadata gets written out or if a .tmp file gets created at all... however you can add custom metadata fields to it.
>> 
>> cas-crawler: Allows for metadata extraction (custom metadata) from files on your local system... and then allows you to ingest them into the filemgr (optionally can be turned off)
>> 
>> HTH
>> -brian
>> 
>> On Nov 08, 2012, at 06:11 PM, "Verma, Rishi (388J)" <Ri...@jpl.nasa.gov> wrote:
>> 
>>> Hi All -
>>> 
>>> I'm wondering if anyone has experience with, or knows the details of how to use custom MetExtractors on products that are remotely downloaded via PushPull. 
>>> 
>>> By default, PushPull performs some basic met-extraction and creates a ".tmp" file associated with downloaded products, but I'm wondering whether this met generation step is customizable.
>>> 
>>> I've looked through the configuration files (e.g. [1], [2]) as well as the code for PushPull, but I can't seem to locate configuration parameters to support the invocation of custom met extractors on downloaded data.
>>> 
>>> If any of you have experience with this, or can point me on where to look, I'd really appreciate it.
>>> 
>>> Thanks! 
>>> Rishi 
>>> 
>>> --
>>> [1] http://svn.apache.org/repos/asf/oodt/trunk/pushpull/src/main/resources/push_pull_framework.properties
>>>  
>>> [2] http://svn.apache.org/repos/asf/oodt/trunk/pushpull/src/main/resources/examples/
> 


Re: PushPull framework and custom met extraction

Posted by "Verma, Rishi (388J)" <Ri...@jpl.nasa.gov>.
Hey Brian,

That sounds pretty reasonable. Thanks for your help on this!

rishi

On Nov 9, 2012, at 12:07 PM, Brian Foster wrote:

Hey Rishi,

The filemgr connection from the pushpull is just to verify if the filemgr already has a file, so the pushpull doesn't redownload files (no ingest support)... usually you configure your pushpull deamon to run at longer interval times, but the crawler usually will wake up more often (every 30 seconds is a typical interval time for it)... so just have the pushpull download its files to a staging area which is the same directory which the crawler is monitoring.

-brian

On Nov 09, 2012, at 11:06 AM, "Verma, Rishi (388J)" <Ri...@jpl.nasa.gov>> wrote:

Hey Brian, Shreyl,

Thanks for your input and clarification on this.

Brian - the delegation of duties you described makes sense. Does cas-puspull have any way to invoke a local crawl process following completion of downloads? I know it has a filemgr hookup, but I wonder about whether a crawl process can be invoked following the completion of all file downloads via pushpull. The alternative way of doing this could, of course, be to schedule the crawler deamon to run well after the pushpull deamon finishes its work.

Thanks to both of you for your help!
rishi

On Nov 9, 2012, at 10:08 AM, Brian Foster wrote:


Hey Rishi,

You will need to use both cas-pushpull and cas-crawler to accomplish this...

cas-pushpull: Used to for downloading files from remote sites to you local systems... the .tmp files contain cas-pushpull's known metadata and you can configure which of the known metadata gets written out or if a .tmp file gets created at all... however you can add custom metadata fields to it.

cas-crawler: Allows for metadata extraction (custom metadata) from files on your local system... and then allows you to ingest them into the filemgr (optionally can be turned off)

HTH
-brian

On Nov 08, 2012, at 06:11 PM, "Verma, Rishi (388J)" <Ri...@jpl.nasa.gov>> wrote:

Hi All -

I'm wondering if anyone has experience with, or knows the details of how to use custom MetExtractors on products that are remotely downloaded via PushPull.

By default, PushPull performs some basic met-extraction and creates a ".tmp" file associated with downloaded products, but I'm wondering whether this met generation step is customizable.

I've looked through the configuration files (e.g. [1], [2]) as well as the code for PushPull, but I can't seem to locate configuration parameters to support the invocation of custom met extractors on downloaded data.

If any of you have experience with this, or can point me on where to look, I'd really appreciate it.

Thanks!
Rishi

--
[1] http://svn.apache.org/repos/asf/oodt/trunk/pushpull/src/main/resources/push_pull_framework.properties

[2] http://svn.apache.org/repos/asf/oodt/trunk/pushpull/src/main/resources/examples/



Re: PushPull framework and custom met extraction

Posted by "Verma, Rishi (388J)" <Ri...@jpl.nasa.gov>.
Hey Brian,

That sounds pretty reasonable. Thanks for your help on this!

rishi

On Nov 9, 2012, at 12:07 PM, Brian Foster wrote:

Hey Rishi,

The filemgr connection from the pushpull is just to verify if the filemgr already has a file, so the pushpull doesn't redownload files (no ingest support)... usually you configure your pushpull deamon to run at longer interval times, but the crawler usually will wake up more often (every 30 seconds is a typical interval time for it)... so just have the pushpull download its files to a staging area which is the same directory which the crawler is monitoring.

-brian

On Nov 09, 2012, at 11:06 AM, "Verma, Rishi (388J)" <Ri...@jpl.nasa.gov>> wrote:

Hey Brian, Shreyl,

Thanks for your input and clarification on this.

Brian - the delegation of duties you described makes sense. Does cas-puspull have any way to invoke a local crawl process following completion of downloads? I know it has a filemgr hookup, but I wonder about whether a crawl process can be invoked following the completion of all file downloads via pushpull. The alternative way of doing this could, of course, be to schedule the crawler deamon to run well after the pushpull deamon finishes its work.

Thanks to both of you for your help!
rishi

On Nov 9, 2012, at 10:08 AM, Brian Foster wrote:


Hey Rishi,

You will need to use both cas-pushpull and cas-crawler to accomplish this...

cas-pushpull: Used to for downloading files from remote sites to you local systems... the .tmp files contain cas-pushpull's known metadata and you can configure which of the known metadata gets written out or if a .tmp file gets created at all... however you can add custom metadata fields to it.

cas-crawler: Allows for metadata extraction (custom metadata) from files on your local system... and then allows you to ingest them into the filemgr (optionally can be turned off)

HTH
-brian

On Nov 08, 2012, at 06:11 PM, "Verma, Rishi (388J)" <Ri...@jpl.nasa.gov>> wrote:

Hi All -

I'm wondering if anyone has experience with, or knows the details of how to use custom MetExtractors on products that are remotely downloaded via PushPull.

By default, PushPull performs some basic met-extraction and creates a ".tmp" file associated with downloaded products, but I'm wondering whether this met generation step is customizable.

I've looked through the configuration files (e.g. [1], [2]) as well as the code for PushPull, but I can't seem to locate configuration parameters to support the invocation of custom met extractors on downloaded data.

If any of you have experience with this, or can point me on where to look, I'd really appreciate it.

Thanks!
Rishi

--
[1] http://svn.apache.org/repos/asf/oodt/trunk/pushpull/src/main/resources/push_pull_framework.properties

[2] http://svn.apache.org/repos/asf/oodt/trunk/pushpull/src/main/resources/examples/



Re: PushPull framework and custom met extraction

Posted by "Mattmann, Chris A (388J)" <ch...@jpl.nasa.gov>.
+1...

Cheers,
Chris

On Nov 10, 2012, at 9:07 AM, Brian Foster wrote:

> Hey Rishi,
> 
> The filemgr connection from the pushpull is just to verify if the filemgr already has a file, so the pushpull doesn't redownload files (no ingest support)... usually you configure your pushpull deamon to run at longer interval times, but the crawler usually will wake up more often (every 30 seconds is a typical interval time for it)... so just have the pushpull download its files to a staging area which is the same directory which the crawler is monitoring.
> 
> -brian
> 
> On Nov 09, 2012, at 11:06 AM, "Verma, Rishi (388J)" <Ri...@jpl.nasa.gov> wrote:
> 
>> Hey Brian, Shreyl,
>> 
>> Thanks for your input and clarification on this.
>> 
>> Brian - the delegation of duties you described makes sense. Does cas-puspull have any way to invoke a local crawl process following completion of downloads? I know it has a filemgr hookup, but I wonder about whether a crawl process can be invoked following the completion of all file downloads via pushpull. The alternative way of doing this could, of course, be to schedule the crawler deamon to run well after the pushpull deamon finishes its work.
>> 
>> Thanks to both of you for your help!
>> rishi
>> 
>> On Nov 9, 2012, at 10:08 AM, Brian Foster wrote:
>> 
>>> 
>>> Hey Rishi,
>>> 
>>> You will need to use both cas-pushpull and cas-crawler to accomplish this...
>>> 
>>> cas-pushpull: Used to for downloading files from remote sites to you local systems... the .tmp files contain cas-pushpull's known metadata and you can configure which of the known metadata gets written out or if a .tmp file gets created at all... however you can add custom metadata fields to it.
>>> 
>>> cas-crawler: Allows for metadata extraction (custom metadata) from files on your local system... and then allows you to ingest them into the filemgr (optionally can be turned off)
>>> 
>>> HTH
>>> -brian
>>> 
>>> On Nov 08, 2012, at 06:11 PM, "Verma, Rishi (388J)" <Ri...@jpl.nasa.gov> wrote:
>>> 
>>>> Hi All -
>>>> 
>>>> I'm wondering if anyone has experience with, or knows the details of how to use custom MetExtractors on products that are remotely downloaded via PushPull. 
>>>> 
>>>> By default, PushPull performs some basic met-extraction and creates a ".tmp" file associated with downloaded products, but I'm wondering whether this met generation step is customizable.
>>>> 
>>>> I've looked through the configuration files (e.g. [1], [2]) as well as the code for PushPull, but I can't seem to locate configuration parameters to support the invocation of custom met extractors on downloaded data.
>>>> 
>>>> If any of you have experience with this, or can point me on where to look, I'd really appreciate it.
>>>> 
>>>> Thanks! 
>>>> Rishi 
>>>> 
>>>> --
>>>> [1] http://svn.apache.org/repos/asf/oodt/trunk/pushpull/src/main/resources/push_pull_framework.properties
>>>>  
>>>> [2] http://svn.apache.org/repos/asf/oodt/trunk/pushpull/src/main/resources/examples/
>> 


Re: PushPull framework and custom met extraction

Posted by "Mattmann, Chris A (388J)" <ch...@jpl.nasa.gov>.
+1...

Cheers,
Chris

On Nov 10, 2012, at 9:07 AM, Brian Foster wrote:

> Hey Rishi,
> 
> The filemgr connection from the pushpull is just to verify if the filemgr already has a file, so the pushpull doesn't redownload files (no ingest support)... usually you configure your pushpull deamon to run at longer interval times, but the crawler usually will wake up more often (every 30 seconds is a typical interval time for it)... so just have the pushpull download its files to a staging area which is the same directory which the crawler is monitoring.
> 
> -brian
> 
> On Nov 09, 2012, at 11:06 AM, "Verma, Rishi (388J)" <Ri...@jpl.nasa.gov> wrote:
> 
>> Hey Brian, Shreyl,
>> 
>> Thanks for your input and clarification on this.
>> 
>> Brian - the delegation of duties you described makes sense. Does cas-puspull have any way to invoke a local crawl process following completion of downloads? I know it has a filemgr hookup, but I wonder about whether a crawl process can be invoked following the completion of all file downloads via pushpull. The alternative way of doing this could, of course, be to schedule the crawler deamon to run well after the pushpull deamon finishes its work.
>> 
>> Thanks to both of you for your help!
>> rishi
>> 
>> On Nov 9, 2012, at 10:08 AM, Brian Foster wrote:
>> 
>>> 
>>> Hey Rishi,
>>> 
>>> You will need to use both cas-pushpull and cas-crawler to accomplish this...
>>> 
>>> cas-pushpull: Used to for downloading files from remote sites to you local systems... the .tmp files contain cas-pushpull's known metadata and you can configure which of the known metadata gets written out or if a .tmp file gets created at all... however you can add custom metadata fields to it.
>>> 
>>> cas-crawler: Allows for metadata extraction (custom metadata) from files on your local system... and then allows you to ingest them into the filemgr (optionally can be turned off)
>>> 
>>> HTH
>>> -brian
>>> 
>>> On Nov 08, 2012, at 06:11 PM, "Verma, Rishi (388J)" <Ri...@jpl.nasa.gov> wrote:
>>> 
>>>> Hi All -
>>>> 
>>>> I'm wondering if anyone has experience with, or knows the details of how to use custom MetExtractors on products that are remotely downloaded via PushPull. 
>>>> 
>>>> By default, PushPull performs some basic met-extraction and creates a ".tmp" file associated with downloaded products, but I'm wondering whether this met generation step is customizable.
>>>> 
>>>> I've looked through the configuration files (e.g. [1], [2]) as well as the code for PushPull, but I can't seem to locate configuration parameters to support the invocation of custom met extractors on downloaded data.
>>>> 
>>>> If any of you have experience with this, or can point me on where to look, I'd really appreciate it.
>>>> 
>>>> Thanks! 
>>>> Rishi 
>>>> 
>>>> --
>>>> [1] http://svn.apache.org/repos/asf/oodt/trunk/pushpull/src/main/resources/push_pull_framework.properties
>>>>  
>>>> [2] http://svn.apache.org/repos/asf/oodt/trunk/pushpull/src/main/resources/examples/
>> 


Re: PushPull framework and custom met extraction

Posted by Brian Foster <ho...@mac.com>.
Hey Rishi,

The filemgr connection from the pushpull is just to verify if the filemgr already has a file, so the pushpull doesn't redownload files (no ingest support)... usually you configure your pushpull deamon to run at longer interval times, but the crawler usually will wake up more often (every 30 seconds is a typical interval time for it)... so just have the pushpull download its files to a staging area which is the same directory which the crawler is monitoring.

-brian

On Nov 09, 2012, at 11:06 AM, "Verma, Rishi (388J)" <Ri...@jpl.nasa.gov> wrote:

> Hey Brian, Shreyl,
>
> Thanks for your input and clarification on this.
>
> Brian - the delegation of duties you described makes sense. Does cas-puspull have any way to invoke a local crawl process following completion of downloads? I know it has a filemgr hookup, but I wonder about whether a crawl process can be invoked following the completion of all file downloads via pushpull. The alternative way of doing this could, of course, be to schedule the crawler deamon to run well after the pushpull deamon finishes its work.
>
> Thanks to both of you for your help!
> rishi
>
> On Nov 9, 2012, at 10:08 AM, Brian Foster wrote:
>
>>
>> Hey Rishi,
>>
>> You will need to use both cas-pushpull and cas-crawler to accomplish this...
>>
>> cas-pushpull: Used to for downloading files from remote sites to you local systems... the .tmp files contain cas-pushpull's known metadata and you can configure which of the known metadata gets written out or if a .tmp file gets created at all... however you can add custom metadata fields to it.
>>
>> cas-crawler: Allows for metadata extraction (custom metadata) from files on your local system... and then allows you to ingest them into the filemgr (optionally can be turned off)
>>
>> HTH
>> -brian
>>
>> On Nov 08, 2012, at 06:11 PM, "Verma, Rishi (388J)" <Ri...@jpl.nasa.gov> wrote:
>>
>>> Hi All -
>>>
>>> I'm wondering if anyone has experience with, or knows the details of how to use custom MetExtractors on products that are remotely downloaded via PushPull. 
>>>
>>> By default, PushPull performs some basic met-extraction and creates a ".tmp" file associated with downloaded products, but I'm wondering whether this met generation step is customizable.
>>>
>>> I've looked through the configuration files (e.g. [1], [2]) as well as the code for PushPull, but I can't seem to locate configuration parameters to support the invocation of custom met extractors on downloaded data.
>>>
>>> If any of you have experience with this, or can point me on where to look, I'd really appreciate it.
>>>
>>> Thanks! 
>>> Rishi 
>>>
>>> --
>>> [1] http://svn.apache.org/repos/asf/oodt/trunk/pushpull/src/main/resources/push_pull_framework.properties
>>>  
>>> [2] http://svn.apache.org/repos/asf/oodt/trunk/pushpull/src/main/resources/examples/
>

Re: PushPull framework and custom met extraction

Posted by Brian Foster <ho...@mac.com>.
Hey Rishi,

The filemgr connection from the pushpull is just to verify if the filemgr already has a file, so the pushpull doesn't redownload files (no ingest support)... usually you configure your pushpull deamon to run at longer interval times, but the crawler usually will wake up more often (every 30 seconds is a typical interval time for it)... so just have the pushpull download its files to a staging area which is the same directory which the crawler is monitoring.

-brian

On Nov 09, 2012, at 11:06 AM, "Verma, Rishi (388J)" <Ri...@jpl.nasa.gov> wrote:

> Hey Brian, Shreyl,
>
> Thanks for your input and clarification on this.
>
> Brian - the delegation of duties you described makes sense. Does cas-puspull have any way to invoke a local crawl process following completion of downloads? I know it has a filemgr hookup, but I wonder about whether a crawl process can be invoked following the completion of all file downloads via pushpull. The alternative way of doing this could, of course, be to schedule the crawler deamon to run well after the pushpull deamon finishes its work.
>
> Thanks to both of you for your help!
> rishi
>
> On Nov 9, 2012, at 10:08 AM, Brian Foster wrote:
>
>>
>> Hey Rishi,
>>
>> You will need to use both cas-pushpull and cas-crawler to accomplish this...
>>
>> cas-pushpull: Used to for downloading files from remote sites to you local systems... the .tmp files contain cas-pushpull's known metadata and you can configure which of the known metadata gets written out or if a .tmp file gets created at all... however you can add custom metadata fields to it.
>>
>> cas-crawler: Allows for metadata extraction (custom metadata) from files on your local system... and then allows you to ingest them into the filemgr (optionally can be turned off)
>>
>> HTH
>> -brian
>>
>> On Nov 08, 2012, at 06:11 PM, "Verma, Rishi (388J)" <Ri...@jpl.nasa.gov> wrote:
>>
>>> Hi All -
>>>
>>> I'm wondering if anyone has experience with, or knows the details of how to use custom MetExtractors on products that are remotely downloaded via PushPull. 
>>>
>>> By default, PushPull performs some basic met-extraction and creates a ".tmp" file associated with downloaded products, but I'm wondering whether this met generation step is customizable.
>>>
>>> I've looked through the configuration files (e.g. [1], [2]) as well as the code for PushPull, but I can't seem to locate configuration parameters to support the invocation of custom met extractors on downloaded data.
>>>
>>> If any of you have experience with this, or can point me on where to look, I'd really appreciate it.
>>>
>>> Thanks! 
>>> Rishi 
>>>
>>> --
>>> [1] http://svn.apache.org/repos/asf/oodt/trunk/pushpull/src/main/resources/push_pull_framework.properties
>>>  
>>> [2] http://svn.apache.org/repos/asf/oodt/trunk/pushpull/src/main/resources/examples/
>

Re: PushPull framework and custom met extraction

Posted by "Verma, Rishi (388J)" <Ri...@jpl.nasa.gov>.
Hey Brian, Shreyl,

Thanks for your input and clarification on this.

Brian - the delegation of duties you described makes sense. Does cas-puspull have any way to invoke a local crawl process following completion of downloads? I know it has a filemgr hookup, but I wonder about whether a crawl process can be invoked following the completion of all file downloads via pushpull. The alternative way of doing this could, of course, be to schedule the crawler deamon to run well after the pushpull deamon finishes its work.

Thanks to both of you for your help!
rishi

On Nov 9, 2012, at 10:08 AM, Brian Foster wrote:


Hey Rishi,

You will need to use both cas-pushpull and cas-crawler to accomplish this...

cas-pushpull: Used to for downloading files from remote sites to you local systems... the .tmp files contain cas-pushpull's known metadata and you can configure which of the known metadata gets written out or if a .tmp file gets created at all... however you can add custom metadata fields to it.

cas-crawler: Allows for metadata extraction (custom metadata) from files on your local system... and then allows you to ingest them into the filemgr (optionally can be turned off)

HTH
-brian

On Nov 08, 2012, at 06:11 PM, "Verma, Rishi (388J)" <Ri...@jpl.nasa.gov>> wrote:

Hi All -

I'm wondering if anyone has experience with, or knows the details of how to use custom MetExtractors on products that are remotely downloaded via PushPull.

By default, PushPull performs some basic met-extraction and creates a ".tmp" file associated with downloaded products, but I'm wondering whether this met generation step is customizable.

I've looked through the configuration files (e.g. [1], [2]) as well as the code for PushPull, but I can't seem to locate configuration parameters to support the invocation of custom met extractors on downloaded data.

If any of you have experience with this, or can point me on where to look, I'd really appreciate it.

Thanks!
Rishi

--
[1] http://svn.apache.org/repos/asf/oodt/trunk/pushpull/src/main/resources/push_pull_framework.properties

[2] http://svn.apache.org/repos/asf/oodt/trunk/pushpull/src/main/resources/examples/


Re: PushPull framework and custom met extraction

Posted by "Mattmann, Chris A (388J)" <ch...@jpl.nasa.gov>.
Perfect Brian, exactly. Rishi, check out the push_pull_framework.properties file. It mentions which 
met fields push pull adds to the info.tmp files and then like Brian said, you use cas-crawler to augment
and extend and add more.

Cheers,
Chris

On Nov 10, 2012, at 7:08 AM, Brian Foster wrote:

> 
> Hey Rishi,
> 
> You will need to use both cas-pushpull and cas-crawler to accomplish this...
> 
> cas-pushpull: Used to for downloading files from remote sites to you local systems... the .tmp files contain cas-pushpull's known metadata and you can configure which of the known metadata gets written out or if a .tmp file gets created at all... however you can add custom metadata fields to it.
> 
> cas-crawler: Allows for metadata extraction (custom metadata) from files on your local system... and then allows you to ingest them into the filemgr (optionally can be turned off)
> 
> HTH
> -brian
> 
> On Nov 08, 2012, at 06:11 PM, "Verma, Rishi (388J)" <Ri...@jpl.nasa.gov> wrote:
> 
>> Hi All -
>> 
>> I'm wondering if anyone has experience with, or knows the details of how to use custom MetExtractors on products that are remotely downloaded via PushPull. 
>> 
>> By default, PushPull performs some basic met-extraction and creates a ".tmp" file associated with downloaded products, but I'm wondering whether this met generation step is customizable.
>> 
>> I've looked through the configuration files (e.g. [1], [2]) as well as the code for PushPull, but I can't seem to locate configuration parameters to support the invocation of custom met extractors on downloaded data.
>> 
>> If any of you have experience with this, or can point me on where to look, I'd really appreciate it.
>> 
>> Thanks! 
>> Rishi 
>> 
>> --
>> [1] http://svn.apache.org/repos/asf/oodt/trunk/pushpull/src/main/resources/push_pull_framework.properties
>>  
>> [2] http://svn.apache.org/repos/asf/oodt/trunk/pushpull/src/main/resources/examples/


Re: PushPull framework and custom met extraction

Posted by "Verma, Rishi (388J)" <Ri...@jpl.nasa.gov>.
Hey Brian, Shreyl,

Thanks for your input and clarification on this.

Brian - the delegation of duties you described makes sense. Does cas-puspull have any way to invoke a local crawl process following completion of downloads? I know it has a filemgr hookup, but I wonder about whether a crawl process can be invoked following the completion of all file downloads via pushpull. The alternative way of doing this could, of course, be to schedule the crawler deamon to run well after the pushpull deamon finishes its work.

Thanks to both of you for your help!
rishi

On Nov 9, 2012, at 10:08 AM, Brian Foster wrote:


Hey Rishi,

You will need to use both cas-pushpull and cas-crawler to accomplish this...

cas-pushpull: Used to for downloading files from remote sites to you local systems... the .tmp files contain cas-pushpull's known metadata and you can configure which of the known metadata gets written out or if a .tmp file gets created at all... however you can add custom metadata fields to it.

cas-crawler: Allows for metadata extraction (custom metadata) from files on your local system... and then allows you to ingest them into the filemgr (optionally can be turned off)

HTH
-brian

On Nov 08, 2012, at 06:11 PM, "Verma, Rishi (388J)" <Ri...@jpl.nasa.gov>> wrote:

Hi All -

I'm wondering if anyone has experience with, or knows the details of how to use custom MetExtractors on products that are remotely downloaded via PushPull.

By default, PushPull performs some basic met-extraction and creates a ".tmp" file associated with downloaded products, but I'm wondering whether this met generation step is customizable.

I've looked through the configuration files (e.g. [1], [2]) as well as the code for PushPull, but I can't seem to locate configuration parameters to support the invocation of custom met extractors on downloaded data.

If any of you have experience with this, or can point me on where to look, I'd really appreciate it.

Thanks!
Rishi

--
[1] http://svn.apache.org/repos/asf/oodt/trunk/pushpull/src/main/resources/push_pull_framework.properties

[2] http://svn.apache.org/repos/asf/oodt/trunk/pushpull/src/main/resources/examples/


Re: PushPull framework and custom met extraction

Posted by "Mattmann, Chris A (388J)" <ch...@jpl.nasa.gov>.
Perfect Brian, exactly. Rishi, check out the push_pull_framework.properties file. It mentions which 
met fields push pull adds to the info.tmp files and then like Brian said, you use cas-crawler to augment
and extend and add more.

Cheers,
Chris

On Nov 10, 2012, at 7:08 AM, Brian Foster wrote:

> 
> Hey Rishi,
> 
> You will need to use both cas-pushpull and cas-crawler to accomplish this...
> 
> cas-pushpull: Used to for downloading files from remote sites to you local systems... the .tmp files contain cas-pushpull's known metadata and you can configure which of the known metadata gets written out or if a .tmp file gets created at all... however you can add custom metadata fields to it.
> 
> cas-crawler: Allows for metadata extraction (custom metadata) from files on your local system... and then allows you to ingest them into the filemgr (optionally can be turned off)
> 
> HTH
> -brian
> 
> On Nov 08, 2012, at 06:11 PM, "Verma, Rishi (388J)" <Ri...@jpl.nasa.gov> wrote:
> 
>> Hi All -
>> 
>> I'm wondering if anyone has experience with, or knows the details of how to use custom MetExtractors on products that are remotely downloaded via PushPull. 
>> 
>> By default, PushPull performs some basic met-extraction and creates a ".tmp" file associated with downloaded products, but I'm wondering whether this met generation step is customizable.
>> 
>> I've looked through the configuration files (e.g. [1], [2]) as well as the code for PushPull, but I can't seem to locate configuration parameters to support the invocation of custom met extractors on downloaded data.
>> 
>> If any of you have experience with this, or can point me on where to look, I'd really appreciate it.
>> 
>> Thanks! 
>> Rishi 
>> 
>> --
>> [1] http://svn.apache.org/repos/asf/oodt/trunk/pushpull/src/main/resources/push_pull_framework.properties
>>  
>> [2] http://svn.apache.org/repos/asf/oodt/trunk/pushpull/src/main/resources/examples/


Re: PushPull framework and custom met extraction

Posted by Brian Foster <ho...@mac.com>.
Hey Rishi,

You will need to use both cas-pushpull and cas-crawler to accomplish this...

cas-pushpull: Used to for downloading files from remote sites to you local systems... the .tmp files contain cas-pushpull's known metadata and you can configure which of the known metadata gets written out or if a .tmp file gets created at all... however you can add custom metadata fields to it.

cas-crawler: Allows for metadata extraction (custom metadata) from files on your local system... and then allows you to ingest them into the filemgr (optionally can be turned off)

HTH
-brian

On Nov 08, 2012, at 06:11 PM, "Verma, Rishi (388J)" <Ri...@jpl.nasa.gov> wrote:

> Hi All -
>
> I'm wondering if anyone has experience with, or knows the details of how to use custom MetExtractors on products that are remotely downloaded via PushPull. 
>
> By default, PushPull performs some basic met-extraction and creates a ".tmp" file associated with downloaded products, but I'm wondering whether this met generation step is customizable.
>
> I've looked through the configuration files (e.g. [1], [2]) as well as the code for PushPull, but I can't seem to locate configuration parameters to support the invocation of custom met extractors on downloaded data.
>
> If any of you have experience with this, or can point me on where to look, I'd really appreciate it.
>
> Thanks! 
> Rishi 
>
> --
> [1] http://svn.apache.org/repos/asf/oodt/trunk/pushpull/src/main/resources/push_pull_framework.properties
>  
> [2] http://svn.apache.org/repos/asf/oodt/trunk/pushpull/src/main/resources/examples/

Re: PushPull framework and custom met extraction

Posted by Brian Foster <ho...@mac.com>.
Hey Rishi,

You will need to use both cas-pushpull and cas-crawler to accomplish this...

cas-pushpull: Used to for downloading files from remote sites to you local systems... the .tmp files contain cas-pushpull's known metadata and you can configure which of the known metadata gets written out or if a .tmp file gets created at all... however you can add custom metadata fields to it.

cas-crawler: Allows for metadata extraction (custom metadata) from files on your local system... and then allows you to ingest them into the filemgr (optionally can be turned off)

HTH
-brian

On Nov 08, 2012, at 06:11 PM, "Verma, Rishi (388J)" <Ri...@jpl.nasa.gov> wrote:

> Hi All -
>
> I'm wondering if anyone has experience with, or knows the details of how to use custom MetExtractors on products that are remotely downloaded via PushPull. 
>
> By default, PushPull performs some basic met-extraction and creates a ".tmp" file associated with downloaded products, but I'm wondering whether this met generation step is customizable.
>
> I've looked through the configuration files (e.g. [1], [2]) as well as the code for PushPull, but I can't seem to locate configuration parameters to support the invocation of custom met extractors on downloaded data.
>
> If any of you have experience with this, or can point me on where to look, I'd really appreciate it.
>
> Thanks! 
> Rishi 
>
> --
> [1] http://svn.apache.org/repos/asf/oodt/trunk/pushpull/src/main/resources/push_pull_framework.properties
>  
> [2] http://svn.apache.org/repos/asf/oodt/trunk/pushpull/src/main/resources/examples/