You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@oodt.apache.org by Sheryl John <sh...@gmail.com> on 2011/09/15 01:39:46 UTC

PGETask Workflow Metadata

Hi,

I have defined some key-val pairs for a file( say Output.csv) in a
metout-config.xml for my PGETask Workflow. However, after executing the
workflow, the met-config.xml is not creating a Output.csv.cas file.

I want to be able to use the above keys/metadata later on in an SQL-like
query from the pgeconfig file.
For example, if I've defined 'RecordID' as a key in the metout-config.xml, I
would want to use this metadata in the following query:

SQL(FORMAT='$FileLocation/$Filename'){ SELECT
FileLocation,Filename,ISMTable,*RecordID* FROM ISMRawData WHERE ISMTable =
'Chartevents'  AND *RecordID* = "PID"}

The others keys included in the query above are elements and product-types
that were defined during ingestion in the File Manager.

So, right now, the task fails to parse the above query when I run the
workflow. Is this because metout-config is not creating the Output.cas file?
 And, whats the best way to specify metadata files for a group of files or
for a folder?

Thanks,

On Fri, Sep 9, 2011 at 10:07 PM, Sheryl John <sh...@gmail.com> wrote:

> Oh Ok. So, it adds workflow metadata to the existing metadata.
> As you suggest, I will continue querying from the PGE Config for pulling
> specific files.
>
> I'll be glad to contribute to the merging of the PGETask Workflow
> Pre-condition to the trunk 0.4.
>
> Thanks Chris!
>
>
> On Fri, Sep 9, 2011 at 8:12 PM, Mattmann, Chris A (388J) <
> chris.a.mattmann@jpl.nasa.gov> wrote:
>
>> Hey Sheryl,
>>
>> On Sep 9, 2011, at 6:10 PM, Sheryl John wrote:
>>
>> > Hi,
>> >
>> > I have questions regarding the Workflow Manager, particularly the Met
>> File writers and querying data from the File Manager.
>> >
>> > 1) For files that are required for a PGE task workflow, do I specify the
>> metadata key-value pairs of the file ( e.g Key="TableName" Val="Chartevents"
>> ) in the metout-config.xml ?
>>
>> Basically metout-config.xml is for the specific
>> MetadataListPcsMetFileWriter [1] instance configured in your CAS-PGE
>> pge-config.xml file.
>> This file defines metadata to pull out of the workflow context metadata,
>> and to write (and merge) with the rest of the file metadata for the product
>> you are about to ingest. So, putting a key in metout-config.xml is like
>> saying "I'd like to copy this workflow context metadata to the file product
>> metadata".
>>
>> > And, how does the Workflow mgr use the values for the next step/task in
>> the pipeline?
>>
>> See above.
>>
>> >
>> >
>> > 2) To query the File Manager, should I use the Query building option
>> available in the PGETask Workflow Pre-Condition?
>> >   I have previously used the SQL-like query in a Pge Config file to pull
>> ingested files, but after reading the Workflow 2 Guide, I was wondering if
>> the File Mgr querying should be done in the pre-conditon of a task. So, this
>> would be a pre-condition for checking if input files are available before a
>> task begins. Is this right?
>>
>> To use the condition based querying from PGETaskWorkflowCondition, you'd
>> need to use wengine-branch.
>>
>> Rather than do so, I'd recommend just wiring the SQL(... query into your
>> CAS PGE config and doing the querying there. It'll be
>> simpler, and I have plans to merge in PGETask Workflow Pre-Condition later
>> into the trunk for 0.4. If that's something you'd like
>> to help with, I'd love to see a JIRA issue and a patch and I'll happily
>> shepherd it in.
>>
>> Thanks Sheryl!
>>
>> Cheers,
>> Chris
>>
>> [1] http://s.apache.org/bW4
>>
>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>> Chris Mattmann, Ph.D.
>> Senior Computer Scientist
>> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
>> Office: 171-266B, Mailstop: 171-246
>> Email: chris.a.mattmann@nasa.gov
>> WWW:   http://sunset.usc.edu/~mattmann/
>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>> Adjunct Assistant Professor, Computer Science Department
>> University of Southern California, Los Angeles, CA 90089 USA
>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>
>>
>
>
> --
> -Sheryl
>



-- 
-Sheryl