You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@airavata.apache.org by Sanjaya Medonsa <sa...@gmail.com> on 2013/07/08 09:09:32 UTC

Re: Apache Airavata-OODT Integration

Hi Chris,
     I have started looking at changing the current implementation to use
file Name instead of product id. As per the current PGETask wrapper
implementation, it takes two inputs (Product ID or file path at the remote
location. If filePath is used force staging should be set. But I am not
quite sure what it means by force staging). If I am to use the current
provisions in PGETaskWrapper, then remote file path (Not the file  name)
has to be given as input. I am not quite sure whether it is ideal to use
file path instead of file name. If filename to use as input, then
FilesStager needs to be customized to  retrieve product references from
file name. File manager client doesn't have a mechanism to retrieve product
by file name. But it has mechanism to retrieve product by product name. I
guess typically both are the same. One drawback of this approach is that it
doesn't support list of product names. The method getProductReferences
which returns list of products is based on back end implementation that is
based on product id, through actual input is product (Product with just
product name set is not possible to as input). Please let me know your
thoughts.

Best Regards,
Sanjaya




On Mon, Jun 17, 2013 at 5:52 PM, Sanjaya Medonsa <sa...@gmail.com>wrote:

> Thanks Chris. I'll update the implementation to use file name instead of
> OODT product id.
>
> Cheers,
> Sanjaya
>
>
> On Sun, Jun 16, 2013 at 12:51 AM, Mattmann, Chris A (398J) <
> chris.a.mattmann@jpl.nasa.gov> wrote:
>
>> Hey Sanjaya, sure +1 use the Filename. It's not guaranteed to be unique,
>> but you can easily just pop the first one off the top (latest) and take
>> that (since it's sorted by product received time). You may check out the
>> pcs-core module and some of its internal classes like FileManagerUtils
>> to see some cool helper functions that could aid in this regard.
>>
>> Cheers,
>> Chris
>>
>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>> Chris Mattmann, Ph.D.
>> Senior Computer Scientist
>> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
>> Office: 171-266B, Mailstop: 171-246
>> Email: chris.a.mattmann@nasa.gov
>> WWW:  http://sunset.usc.edu/~mattmann/
>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>> Adjunct Assistant Professor, Computer Science Department
>> University of Southern California, Los Angeles, CA 90089 USA
>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>
>>
>>
>>
>>
>>
>> -----Original Message-----
>> From: Sanjaya Medonsa <sa...@gmail.com>
>> Reply-To: "dev@airavata.apache.org" <de...@airavata.apache.org>
>> Date: Saturday, June 15, 2013 4:04 AM
>> To: Airavata Dev <de...@airavata.apache.org>
>> Subject: Re: Apache Airavata-OODT Integration
>>
>> >Thanks Chris for your help! Working directory is available in
>> >JobExecutionContext in Airavata and directory can easily be retrieved.
>> >Issue in my case is that, from XBaya GUI I take product id as input not
>> >the
>> >file name. Internally file stager query the file manager using product id
>> >to retrieve product reference and corresponding file name to stage the
>> >file
>> >into input dir. Since this product id to file name mapping happens
>> >internally during the file staging, my implementation don't have access
>> to
>> >filename unless I query the file manager to retrieve the corresponding
>> >file
>> >name using product id.
>> >
>> >One of the major issue in my implementation seems that I use OODT product
>> >id as input, not the file name. Should I change my implementation to use
>> >file name instead of product id ?
>> >
>> >Best Regards,
>> >Sanjaya
>> >
>> >
>> >On Fri, Jun 14, 2013 at 8:51 PM, Mattmann, Chris A (398J) <
>> >chris.a.mattmann@jpl.nasa.gov> wrote:
>> >
>> >> Hey Sanjaya,
>> >>
>> >> Easy, see the attached PGEConfig.xml here:
>> >>
>> >> http://paste.apache.org/6OGW
>> >>
>> >> In that file:
>> >>
>> >> 1. We compute the staged file path by computing JobDir
>> >> 2. We create in the exe block a staged input dir
>> >> 3. We stage the files just using cps in the exeBlock (could have
>> >> just as easily used fileStager)
>> >> 4. We know that the file is [JobInputDir]/[Filename]
>> >>
>> >> HTH.
>> >>
>> >> Cheers,
>> >> Chris
>> >>
>> >> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>> >> Chris Mattmann, Ph.D.
>> >> Senior Computer Scientist
>> >> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
>> >> Office: 171-266B, Mailstop: 171-246
>> >> Email: chris.a.mattmann@nasa.gov
>> >> WWW:  http://sunset.usc.edu/~mattmann/
>> >> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>> >> Adjunct Assistant Professor, Computer Science Department
>> >> University of Southern California, Los Angeles, CA 90089 USA
>> >> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >> -----Original Message-----
>> >> From: Sanjaya Medonsa <sa...@gmail.com>
>> >> Reply-To: "dev@airavata.apache.org" <de...@airavata.apache.org>
>> >> Date: Friday, June 14, 2013 5:02 AM
>> >> To: Airavata Dev <de...@airavata.apache.org>
>> >> Subject: Re: Apache Airavata-OODT Integration
>> >>
>> >> >Thanks Chris for your input. I actually use the PGETaskInstance for
>> >>file
>> >> >staging with minimal additional code. But my issue issue not with the
>> >>file
>> >> >staging. As per my current implementation, application inputs product
>> >>id.
>> >> >Then using the capabilities in PGETaskInstance class, it does the file
>> >> >staging. But my issue is that during the file staging product is
>> >>mapped to
>> >> >a file in specified working directory. I don't have a way to retrieve
>> >>the
>> >> >staged file name, as it is not recorded in Metadata (For this purpose,
>> >>I
>> >> >query the FileManager again to get the corresponding reference name
>> >>for a
>> >> >given product id). I need the staged file path, since I modify the
>> >>input
>> >> >product id into staged file path prior to actual workflow invocation.
>> >> >Basically I am looking for some implementation where I can easily
>> >> >retrieve,
>> >> >staged file path for a given product id.
>> >> >
>> >> >Cheers,
>> >> >Sanjaya
>> >> >
>> >> >
>> >> >On Wed, Jun 12, 2013 at 10:04 PM, Mattmann, Chris A (398J) <
>> >> >chris.a.mattmann@jpl.nasa.gov> wrote:
>> >> >
>> >> >> Hi Sanjaya,
>> >> >>
>> >> >> -----Original Message-----
>> >> >>
>> >> >> From: Sanjaya Medonsa <sa...@gmail.com>
>> >> >> Reply-To: "dev@airavata.apache.org" <de...@airavata.apache.org>
>> >> >> Date: Monday, June 10, 2013 5:20 PM
>> >> >> To: "dev@airavata.apache.org" <de...@airavata.apache.org>
>> >> >> Cc: "dev@oodt.apache.org" <de...@oodt.apache.org>
>> >> >> Subject: Re: Apache Airavata-OODT Integration
>> >> >>
>> >> >> >Hi Chris,
>> >> >> >       On configuration, I have get rid of all the configuration
>> >>files,
>> >> >> >including pge-config.xml. All the required configurations are
>> >> >> >programmatically set.  Configurations such FileManagerServer URL
>> are
>> >> >> >configured in the airavata-server.properties file. I'll update the
>> >> >>review
>> >> >> >request with modified details.
>> >> >>
>> >> >> Great work!
>> >> >>
>> >> >>
>> >> >> >       Still I am not quite clear on how to retrieve staged file
>> >>path
>> >> >> >properly. Currently I am using getStagedFilePath method
>> >> >> >in ApacheAiravataWorkFlowInstanceImpl to regenerate the staged file
>> >> >>path.
>> >> >> >While I am going through the OODT code that I have seen method in
>> >> >> >DataTransferer to notify FileManagerServer once transfer is
>> >>completed.
>> >> >>But
>> >> >> >I couldn't see the same for product retrieval.
>> >> >>
>> >> >> Example:
>> >> >>
>> >> >>
>> >>
>> >>
>> http://svn.apache.org/repos/asf/oodt/trunk/pge/src/test/resources/pge-con
>> >> >>fi
>> >> >> g.xml
>> >> >>
>> >> >>
>> >> >> Review Board tickets:
>> >> >> https://reviews.apache.org/r/4746/
>> >> >>
>> >> >> https://reviews.apache.org/r/5382/
>> >> >>
>> >> >>
>> >> >> JIRA issue source (in OODT since 0.4):
>> >> >>   https://issues.apache.org/jira/browse/OODT-443
>> >> >>
>> >> >>
>> >> >> >       As you suggested I'll improve my workflow using Apache Tika.
>> >>I'd
>> >> >> >like to continue this as an Parallal task. While modifying staging
>> >> >> >implementation based on community feedback, currently I am looking
>> >>at
>> >> >> >ingesting output back to OODT.
>> >> >>
>> >> >> See above for info on file staging. I would strongly encourage you
>> >>not
>> >> >> to reimplement CAS-PGE in Airavata -- it's pretty functional and
>> >> >>expressive
>> >> >> anyways and I would work to figure out how to make Airavata leverage
>> >> >> CAS-PGE.
>> >> >>
>> >> >> Cheers,
>> >> >> Chris
>> >> >>
>> >> >> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>> >> >> Chris Mattmann, Ph.D.
>> >> >> Senior Computer Scientist
>> >> >> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
>> >> >> Office: 171-266B, Mailstop: 171-246
>> >> >> Email: chris.a.mattmann@nasa.gov
>> >> >> WWW:  http://sunset.usc.edu/~mattmann/
>> >> >> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>> >> >> Adjunct Assistant Professor, Computer Science Department
>> >> >> University of Southern California, Los Angeles, CA 90089 USA
>> >> >> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>> >> >>
>> >> >>
>> >> >>
>> >> >> >
>> >> >> >
>> >> >> >
>> >> >> >On Wed, Jun 5, 2013 at 12:11 AM, Mattmann, Chris A (398J) <
>> >> >> >chris.a.mattmann@jpl.nasa.gov> wrote:
>> >> >> >
>> >> >> >> Hi Sanjaya,
>> >> >> >>
>> >> >> >> I think starting out with /bin/ls would be good, maybe like a
>> >>/bin/ls
>> >> >> >> workflow, and then for each file returned, maybe run Apache Tika
>> >>and
>> >> >> >> extract its metadata and then pipe that to a file?
>> >> >> >>
>> >> >> >> How about that?
>> >> >> >>
>> >> >> >> Cheers,
>> >> >> >> Chris
>> >> >> >>
>> >> >> >>
>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>> >> >> >> Chris Mattmann, Ph.D.
>> >> >> >> Senior Computer Scientist
>> >> >> >> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
>> >> >> >> Office: 171-266B, Mailstop: 171-246
>> >> >> >> Email: chris.a.mattmann@nasa.gov
>> >> >> >> WWW:  http://sunset.usc.edu/~mattmann/
>> >> >> >>
>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>> >> >> >> Adjunct Assistant Professor, Computer Science Department
>> >> >> >> University of Southern California, Los Angeles, CA 90089 USA
>> >> >> >>
>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>> >> >> >>
>> >> >> >>
>> >> >> >>
>> >> >> >>
>> >> >> >>
>> >> >> >>
>> >> >> >> -----Original Message-----
>> >> >> >> From: Sanjaya Medonsa <sa...@gmail.com>
>> >> >> >> Reply-To: "dev@airavata.apache.org" <de...@airavata.apache.org>
>> >> >> >> Date: Tuesday, June 4, 2013 5:31 AM
>> >> >> >> To: "dev@airavata.apache.org" <de...@airavata.apache.org>
>> >> >> >> Cc: "dev@oodt.apache.org" <de...@oodt.apache.org>
>> >> >> >> Subject: Re: Apache Airavata-OODT Integration
>> >> >> >>
>> >> >> >> >Hi Chris,
>> >> >> >> >     Please see my comments below on the two items.
>> >> >> >> >
>> >> >> >> >Configuration : It should be possible to set them
>> >>programmatically.
>> >> >> >> >Actually I have implemented partly it for file staging
>> >>information.
>> >> >> >>I'll
>> >> >> >> >work to get rid of the other configuration files.
>> >> >> >> >
>> >> >> >> >Staged File Path : I'll work on the suggested approach, though I
>> >>am
>> >> >>not
>> >> >> >> >fully understand it at the moment. I guess I need to go through
>> >>bit
>> >> >> >>more
>> >> >> >> >on
>> >> >> >> >CAS-PGE and come back to you on the proposed approach.
>> >> >> >> >
>> >> >> >> >Currently I am testing this by wrapping /bin/ls command as GFac
>> >> >> >>service. I
>> >> >> >> >may need to test this with real workflow. Could you please
>> >>provide
>> >> >>me
>> >> >> >>know
>> >> >> >> >some guidance on better scenario to test this.
>> >> >> >> >
>> >> >> >> >Cheers,
>> >> >> >> >Sanjaya
>> >> >> >> >
>> >> >> >> >
>> >> >> >> >
>> >> >> >> >
>> >> >> >> >On Mon, Jun 3, 2013 at 8:17 PM, Mattmann, Chris A (398J) <
>> >> >> >> >chris.a.mattmann@jpl.nasa.gov> wrote:
>> >> >> >> >
>> >> >> >> >> Hi Sanjaya,
>> >> >> >> >>
>> >> >> >> >> -----Original Message-----
>> >> >> >> >>
>> >> >> >> >> From: Sanjaya Medonsa <sa...@gmail.com>
>> >> >> >> >> Reply-To: "dev@airavata.apache.org" <de...@airavata.apache.org>
>> >> >> >> >> Date: Thursday, May 30, 2013 5:12 AM
>> >> >> >> >> To: "dev@oodt.apache.org" <de...@oodt.apache.org>,
>> >> >> >> >>"dev@airavata.apache.org"
>> >> >> >> >> <de...@airavata.apache.org>
>> >> >> >> >> Subject: Apache Airavata-OODT Integration
>> >> >> >> >>
>> >> >> >> >> >Hi,
>> >> >> >> >> >     I have worked on the Apache Airavata integration with
>> >>Apache
>> >> >> >> >>OODT. As
>> >> >> >> >> >a first step, I have implemented integration with Apache OODT
>> >> >>file
>> >> >> >> >> >manager component.
>> >> >> >> >>
>> >> >> >> >> Great work!!
>> >> >> >> >>
>> >> >> >> >> Comments below:
>> >> >> >> >>
>> >> >> >> >> >      1. Introduce a new GFac Schema type called OODTProduct
>> >> >>which
>> >> >> >> >>takes
>> >> >> >> >> >APache OODT product IDs as input.
>> >> >> >> >> >      2. Implemented new pre GFac Handler by extending Apache
>> >> >>OODT
>> >> >> >> >> >PgeTaskInstance to stage the corresponding file into the
>> >>working
>> >> >> >> >> >directory.
>> >> >> >> >> >      3. Once file is staged, input parameter with OODT
>> >>product
>> >> >>id
>> >> >> >>is
>> >> >> >> >> >replaced with path of the staged file for downstream
>> >>processing
>> >> >> >> >> >
>> >> >> >> >> >I have tested the implementation with Gfac application which
>> >> >>wraps
>> >> >> >> >>/bin/ls
>> >> >> >> >> >command. Application takes product id as input and stage
>> >> >> >>corresponding
>> >> >> >> >> >file
>> >> >> >> >> >into the working directory and /bin/ls is executed against
>> the
>> >> >> >>staged
>> >> >> >> >> >file.
>> >> >> >> >> >Hope this is a valid testing scenario.
>> >> >> >> >> >
>> >> >> >> >> >Concerns
>> >> >> >> >> >- Configurations : I have added new configuration file named
>> >>and
>> >> >> >> >> >oodt-integration.properties in addition to
>> >>dynamic_metadata.met
>> >> >>and
>> >> >> >> >> >pge-config.xml files used by OODT. But at the moment there is
>> >>no
>> >> >> >>item
>> >> >> >> >> >configured with the oodt-integration.properties.
>> >> >> >> >>
>> >> >> >> >> You probably only need the pge-config.xml file. Dynamic
>> >>metadata,
>> >> >>and
>> >> >> >> >>the
>> >> >> >> >> task configuration properties can be specified
>> >>programmatically,
>> >> >> >>right?
>> >> >> >> >>
>> >> >> >> >> >- Staged File Name - With the current implementation of
>> >> >> >> >>PgeTaskInstance it
>> >> >> >> >> >is not possible to retrieve path of the staged file. Due to
>> >>this
>> >> >> >> >> >limitation, I have query the FileManagerServer with product
>> id
>> >> >>and
>> >> >> >> >> >retrieve
>> >> >> >> >> >the file name and computed the file path using information of
>> >> >> >>working
>> >> >> >> >> >directory.
>> >> >> >> >>
>> >> >> >> >> I'm not sure I understand this? If you store and record the
>> >> >>Filename,
>> >> >> >> >>and
>> >> >> >> >> FileLocation
>> >> >> >> >> metadata files, then you can easily retrieve the staged file
>> >>path
>> >> >> >>via a
>> >> >> >> >> SQLquery
>> >> >> >> >> via CAS-PGE by simply setting the
>> >> >>FORMAT=('$FileLocation/$Filename')
>> >> >> >>in
>> >> >> >> >> the response.
>> >> >> >> >> Can you comment on this?
>> >> >> >> >>
>> >> >> >> >> >- Currently it is not possible to execute the workflow using
>> >> >>Xbaya
>> >> >> >>due
>> >> >> >> >>to
>> >> >> >> >> >validation failure due to new schema type. I have commented
>> >>out
>> >> >>the
>> >> >> >> >> >relevant validation code for testing purpose.
>> >> >> >> >>
>> >> >> >> >> OK, will probably need to work on this.
>> >> >> >> >>
>> >> >> >> >> >
>> >> >> >> >> >Currently I am having an issue with review board client tool
>> >>and
>> >> >> >>need
>> >> >> >> >>to
>> >> >> >> >> >resolve it to upload the code for review.
>> >> >> >> >>
>> >> >> >> >> I see later that you got this working, so will head over and
>> >> >>review
>> >> >> >>that
>> >> >> >> >> now.
>> >> >> >> >>
>> >> >> >> >> Thanks!
>> >> >> >> >>
>> >> >> >> >> Cheers,
>> >> >> >> >> Chris
>> >> >> >> >>
>> >> >> >> >>
>> >>++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>> >> >> >> >> Chris Mattmann, Ph.D.
>> >> >> >> >> Senior Computer Scientist
>> >> >> >> >> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
>> >> >> >> >> Office: 171-266B, Mailstop: 171-246
>> >> >> >> >> Email: chris.a.mattmann@nasa.gov
>> >> >> >> >> WWW:  http://sunset.usc.edu/~mattmann/
>> >> >> >> >>
>> >>++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>> >> >> >> >> Adjunct Assistant Professor, Computer Science Department
>> >> >> >> >> University of Southern California, Los Angeles, CA 90089 USA
>> >> >> >> >>
>> >>++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>> >> >> >> >>
>> >> >> >> >>
>> >> >> >> >>
>> >> >> >> >>
>> >> >> >>
>> >> >> >>
>> >> >>
>> >> >>
>> >>
>> >>
>>
>>
>

Re: Apache Airavata-OODT Integration

Posted by Sanjaya Medonsa <sa...@gmail.com>.
Hi Chris,
    If some distributed file system is used to make the file path local, I
guess LocalDataTransferer should be used instead of RemoteDataTransferer.
Though the data transfer mechanism is configurable there is an issue with
file staging. If LocalDataTransferer is used,  as per the current
implementation it throws NullPointerException. Problem is
that, FileManagerFileStager class creates a product for data transfer
without product structure. But retrieveProduct method of
the LocalDataTransferer always expect a product structure. This could be
avoided by extending the FileManagerFileStager class. I need to know
whether using LocalDataTransferer is the correct approach, if distributed
file system is used to make the file path local.

Cheers,
Sanjaya


On Mon, Jul 8, 2013 at 7:44 PM, Mattmann, Chris A (398J) <
chris.a.mattmann@jpl.nasa.gov> wrote:

> Hi Sanjaya,
>
> -----Original Message-----
>
> From: Sanjaya Medonsa <sa...@gmail.com>
> Reply-To: "dev@airavata.apache.org" <de...@airavata.apache.org>
> Date: Monday, July 8, 2013 12:09 AM
> To: Airavata Dev <de...@airavata.apache.org>
> Cc: "dev@oodt.apache.org" <de...@oodt.apache.org>
> Subject: Re: Apache Airavata-OODT Integration
>
> >Hi Chris,
> >     I have started looking at changing the current implementation to use
> >file Name instead of product id. As per the current PGETask wrapper
> >implementation, it takes two inputs (Product ID or file path at the remote
> >location. If filePath is used force staging should be set. But I am not
> >quite sure what it means by force staging).
>
> Force staging I believe controls whether or not the staged files are
> overwritten.
>
> > If I am to use the current
> >provisions in PGETaskWrapper, then remote file path (Not the file  name)
> >has to be given as input. I am not quite sure whether it is ideal to use
> >file path instead of file name.
>
> You can easily generate the file path (which does not have to be remote,
> in fact, if you think about it, it could easily be local and in Apache
> OODT,
> we typically ensure it's local by using distributed filesystems like HDFS
> or NFS or Gluster to make remote files appear local by pushing that portion
> down into the distributed filesystem which we think does a better job of
> data movement :) ). To generate the file path you can use CAS-PGE SQLQuery
> facility that will allow you to look up e.g., $FileLocation/$Filename based
> on met fields, which in turn you can then feed into the path.
>
>
> >If filename to use as input, then
> >FilesStager needs to be customized to  retrieve product references from
> >file name.
>
> See above for an alternative.
>
> >File manager client doesn't have a mechanism to retrieve product
> >by file name. But it has mechanism to retrieve product by product name. I
> >guess typically both are the same.
>
> Yeah, or the other easy mechanism is simply to issue a query, e.g., build
> yourself a Filename query and then query the FM Catalog.
>
> >One drawback of this approach is that it
> >doesn't support list of product names. The method getProductReferences
> >which returns list of products is based on back end implementation that is
> >based on product id, through actual input is product (Product with just
> >product name set is not possible to as input). Please let me know your
> >thoughts.
>
> See above.
>
> Cheers,
> Chris
>
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> Chris Mattmann, Ph.D.
> Senior Computer Scientist
> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
> Office: 171-266B, Mailstop: 171-246
> Email: chris.a.mattmann@nasa.gov
> WWW:  http://sunset.usc.edu/~mattmann/
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> Adjunct Assistant Professor, Computer Science Department
> University of Southern California, Los Angeles, CA 90089 USA
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>
>
>
> >
> >
> >
> >
> >On Mon, Jun 17, 2013 at 5:52 PM, Sanjaya Medonsa
> ><sa...@gmail.com>wrote:
> >
> >> Thanks Chris. I'll update the implementation to use file name instead of
> >> OODT product id.
> >>
> >> Cheers,
> >> Sanjaya
> >>
> >>
> >> On Sun, Jun 16, 2013 at 12:51 AM, Mattmann, Chris A (398J) <
> >> chris.a.mattmann@jpl.nasa.gov> wrote:
> >>
> >>> Hey Sanjaya, sure +1 use the Filename. It's not guaranteed to be
> >>>unique,
> >>> but you can easily just pop the first one off the top (latest) and take
> >>> that (since it's sorted by product received time). You may check out
> >>>the
> >>> pcs-core module and some of its internal classes like FileManagerUtils
> >>> to see some cool helper functions that could aid in this regard.
> >>>
> >>> Cheers,
> >>> Chris
> >>>
> >>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> >>> Chris Mattmann, Ph.D.
> >>> Senior Computer Scientist
> >>> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
> >>> Office: 171-266B, Mailstop: 171-246
> >>> Email: chris.a.mattmann@nasa.gov
> >>> WWW:  http://sunset.usc.edu/~mattmann/
> >>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> >>> Adjunct Assistant Professor, Computer Science Department
> >>> University of Southern California, Los Angeles, CA 90089 USA
> >>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>> -----Original Message-----
> >>> From: Sanjaya Medonsa <sa...@gmail.com>
> >>> Reply-To: "dev@airavata.apache.org" <de...@airavata.apache.org>
> >>> Date: Saturday, June 15, 2013 4:04 AM
> >>> To: Airavata Dev <de...@airavata.apache.org>
> >>> Subject: Re: Apache Airavata-OODT Integration
> >>>
> >>> >Thanks Chris for your help! Working directory is available in
> >>> >JobExecutionContext in Airavata and directory can easily be retrieved.
> >>> >Issue in my case is that, from XBaya GUI I take product id as input
> >>>not
> >>> >the
> >>> >file name. Internally file stager query the file manager using
> >>>product id
> >>> >to retrieve product reference and corresponding file name to stage the
> >>> >file
> >>> >into input dir. Since this product id to file name mapping happens
> >>> >internally during the file staging, my implementation don't have
> >>>access
> >>> to
> >>> >filename unless I query the file manager to retrieve the corresponding
> >>> >file
> >>> >name using product id.
> >>> >
> >>> >One of the major issue in my implementation seems that I use OODT
> >>>product
> >>> >id as input, not the file name. Should I change my implementation to
> >>>use
> >>> >file name instead of product id ?
> >>> >
> >>> >Best Regards,
> >>> >Sanjaya
> >>> >
> >>> >
> >>> >On Fri, Jun 14, 2013 at 8:51 PM, Mattmann, Chris A (398J) <
> >>> >chris.a.mattmann@jpl.nasa.gov> wrote:
> >>> >
> >>> >> Hey Sanjaya,
> >>> >>
> >>> >> Easy, see the attached PGEConfig.xml here:
> >>> >>
> >>> >> http://paste.apache.org/6OGW
> >>> >>
> >>> >> In that file:
> >>> >>
> >>> >> 1. We compute the staged file path by computing JobDir
> >>> >> 2. We create in the exe block a staged input dir
> >>> >> 3. We stage the files just using cps in the exeBlock (could have
> >>> >> just as easily used fileStager)
> >>> >> 4. We know that the file is [JobInputDir]/[Filename]
> >>> >>
> >>> >> HTH.
> >>> >>
> >>> >> Cheers,
> >>> >> Chris
> >>> >>
> >>> >> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> >>> >> Chris Mattmann, Ph.D.
> >>> >> Senior Computer Scientist
> >>> >> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
> >>> >> Office: 171-266B, Mailstop: 171-246
> >>> >> Email: chris.a.mattmann@nasa.gov
> >>> >> WWW:  http://sunset.usc.edu/~mattmann/
> >>> >> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> >>> >> Adjunct Assistant Professor, Computer Science Department
> >>> >> University of Southern California, Los Angeles, CA 90089 USA
> >>> >> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> >>> >>
> >>> >>
> >>> >>
> >>> >>
> >>> >>
> >>> >>
> >>> >> -----Original Message-----
> >>> >> From: Sanjaya Medonsa <sa...@gmail.com>
> >>> >> Reply-To: "dev@airavata.apache.org" <de...@airavata.apache.org>
> >>> >> Date: Friday, June 14, 2013 5:02 AM
> >>> >> To: Airavata Dev <de...@airavata.apache.org>
> >>> >> Subject: Re: Apache Airavata-OODT Integration
> >>> >>
> >>> >> >Thanks Chris for your input. I actually use the PGETaskInstance for
> >>> >>file
> >>> >> >staging with minimal additional code. But my issue issue not with
> >>>the
> >>> >>file
> >>> >> >staging. As per my current implementation, application inputs
> >>>product
> >>> >>id.
> >>> >> >Then using the capabilities in PGETaskInstance class, it does the
> >>>file
> >>> >> >staging. But my issue is that during the file staging product is
> >>> >>mapped to
> >>> >> >a file in specified working directory. I don't have a way to
> >>>retrieve
> >>> >>the
> >>> >> >staged file name, as it is not recorded in Metadata (For this
> >>>purpose,
> >>> >>I
> >>> >> >query the FileManager again to get the corresponding reference name
> >>> >>for a
> >>> >> >given product id). I need the staged file path, since I modify the
> >>> >>input
> >>> >> >product id into staged file path prior to actual workflow
> >>>invocation.
> >>> >> >Basically I am looking for some implementation where I can easily
> >>> >> >retrieve,
> >>> >> >staged file path for a given product id.
> >>> >> >
> >>> >> >Cheers,
> >>> >> >Sanjaya
> >>> >> >
> >>> >> >
> >>> >> >On Wed, Jun 12, 2013 at 10:04 PM, Mattmann, Chris A (398J) <
> >>> >> >chris.a.mattmann@jpl.nasa.gov> wrote:
> >>> >> >
> >>> >> >> Hi Sanjaya,
> >>> >> >>
> >>> >> >> -----Original Message-----
> >>> >> >>
> >>> >> >> From: Sanjaya Medonsa <sa...@gmail.com>
> >>> >> >> Reply-To: "dev@airavata.apache.org" <de...@airavata.apache.org>
> >>> >> >> Date: Monday, June 10, 2013 5:20 PM
> >>> >> >> To: "dev@airavata.apache.org" <de...@airavata.apache.org>
> >>> >> >> Cc: "dev@oodt.apache.org" <de...@oodt.apache.org>
> >>> >> >> Subject: Re: Apache Airavata-OODT Integration
> >>> >> >>
> >>> >> >> >Hi Chris,
> >>> >> >> >       On configuration, I have get rid of all the configuration
> >>> >>files,
> >>> >> >> >including pge-config.xml. All the required configurations are
> >>> >> >> >programmatically set.  Configurations such FileManagerServer URL
> >>> are
> >>> >> >> >configured in the airavata-server.properties file. I'll update
> >>>the
> >>> >> >>review
> >>> >> >> >request with modified details.
> >>> >> >>
> >>> >> >> Great work!
> >>> >> >>
> >>> >> >>
> >>> >> >> >       Still I am not quite clear on how to retrieve staged file
> >>> >>path
> >>> >> >> >properly. Currently I am using getStagedFilePath method
> >>> >> >> >in ApacheAiravataWorkFlowInstanceImpl to regenerate the staged
> >>>file
> >>> >> >>path.
> >>> >> >> >While I am going through the OODT code that I have seen method
> >>>in
> >>> >> >> >DataTransferer to notify FileManagerServer once transfer is
> >>> >>completed.
> >>> >> >>But
> >>> >> >> >I couldn't see the same for product retrieval.
> >>> >> >>
> >>> >> >> Example:
> >>> >> >>
> >>> >> >>
> >>> >>
> >>> >>
> >>>
> >>>
> http://svn.apache.org/repos/asf/oodt/trunk/pge/src/test/resources/pge-co
> >>>n
> >>> >> >>fi
> >>> >> >> g.xml
> >>> >> >>
> >>> >> >>
> >>> >> >> Review Board tickets:
> >>> >> >> https://reviews.apache.org/r/4746/
> >>> >> >>
> >>> >> >> https://reviews.apache.org/r/5382/
> >>> >> >>
> >>> >> >>
> >>> >> >> JIRA issue source (in OODT since 0.4):
> >>> >> >>   https://issues.apache.org/jira/browse/OODT-443
> >>> >> >>
> >>> >> >>
> >>> >> >> >       As you suggested I'll improve my workflow using Apache
> >>>Tika.
> >>> >>I'd
> >>> >> >> >like to continue this as an Parallal task. While modifying
> >>>staging
> >>> >> >> >implementation based on community feedback, currently I am
> >>>looking
> >>> >>at
> >>> >> >> >ingesting output back to OODT.
> >>> >> >>
> >>> >> >> See above for info on file staging. I would strongly encourage
> >>>you
> >>> >>not
> >>> >> >> to reimplement CAS-PGE in Airavata -- it's pretty functional and
> >>> >> >>expressive
> >>> >> >> anyways and I would work to figure out how to make Airavata
> >>>leverage
> >>> >> >> CAS-PGE.
> >>> >> >>
> >>> >> >> Cheers,
> >>> >> >> Chris
> >>> >> >>
> >>> >> >>
> >>>++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> >>> >> >> Chris Mattmann, Ph.D.
> >>> >> >> Senior Computer Scientist
> >>> >> >> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
> >>> >> >> Office: 171-266B, Mailstop: 171-246
> >>> >> >> Email: chris.a.mattmann@nasa.gov
> >>> >> >> WWW:  http://sunset.usc.edu/~mattmann/
> >>> >> >>
> >>>++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> >>> >> >> Adjunct Assistant Professor, Computer Science Department
> >>> >> >> University of Southern California, Los Angeles, CA 90089 USA
> >>> >> >>
> >>>++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> >>> >> >>
> >>> >> >>
> >>> >> >>
> >>> >> >> >
> >>> >> >> >
> >>> >> >> >
> >>> >> >> >On Wed, Jun 5, 2013 at 12:11 AM, Mattmann, Chris A (398J) <
> >>> >> >> >chris.a.mattmann@jpl.nasa.gov> wrote:
> >>> >> >> >
> >>> >> >> >> Hi Sanjaya,
> >>> >> >> >>
> >>> >> >> >> I think starting out with /bin/ls would be good, maybe like a
> >>> >>/bin/ls
> >>> >> >> >> workflow, and then for each file returned, maybe run Apache
> >>>Tika
> >>> >>and
> >>> >> >> >> extract its metadata and then pipe that to a file?
> >>> >> >> >>
> >>> >> >> >> How about that?
> >>> >> >> >>
> >>> >> >> >> Cheers,
> >>> >> >> >> Chris
> >>> >> >> >>
> >>> >> >> >>
> >>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> >>> >> >> >> Chris Mattmann, Ph.D.
> >>> >> >> >> Senior Computer Scientist
> >>> >> >> >> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
> >>> >> >> >> Office: 171-266B, Mailstop: 171-246
> >>> >> >> >> Email: chris.a.mattmann@nasa.gov
> >>> >> >> >> WWW:  http://sunset.usc.edu/~mattmann/
> >>> >> >> >>
> >>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> >>> >> >> >> Adjunct Assistant Professor, Computer Science Department
> >>> >> >> >> University of Southern California, Los Angeles, CA 90089 USA
> >>> >> >> >>
> >>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> >>> >> >> >>
> >>> >> >> >>
> >>> >> >> >>
> >>> >> >> >>
> >>> >> >> >>
> >>> >> >> >>
> >>> >> >> >> -----Original Message-----
> >>> >> >> >> From: Sanjaya Medonsa <sa...@gmail.com>
> >>> >> >> >> Reply-To: "dev@airavata.apache.org" <de...@airavata.apache.org>
> >>> >> >> >> Date: Tuesday, June 4, 2013 5:31 AM
> >>> >> >> >> To: "dev@airavata.apache.org" <de...@airavata.apache.org>
> >>> >> >> >> Cc: "dev@oodt.apache.org" <de...@oodt.apache.org>
> >>> >> >> >> Subject: Re: Apache Airavata-OODT Integration
> >>> >> >> >>
> >>> >> >> >> >Hi Chris,
> >>> >> >> >> >     Please see my comments below on the two items.
> >>> >> >> >> >
> >>> >> >> >> >Configuration : It should be possible to set them
> >>> >>programmatically.
> >>> >> >> >> >Actually I have implemented partly it for file staging
> >>> >>information.
> >>> >> >> >>I'll
> >>> >> >> >> >work to get rid of the other configuration files.
> >>> >> >> >> >
> >>> >> >> >> >Staged File Path : I'll work on the suggested approach,
> >>>though I
> >>> >>am
> >>> >> >>not
> >>> >> >> >> >fully understand it at the moment. I guess I need to go
> >>>through
> >>> >>bit
> >>> >> >> >>more
> >>> >> >> >> >on
> >>> >> >> >> >CAS-PGE and come back to you on the proposed approach.
> >>> >> >> >> >
> >>> >> >> >> >Currently I am testing this by wrapping /bin/ls command as
> >>>GFac
> >>> >> >> >>service. I
> >>> >> >> >> >may need to test this with real workflow. Could you please
> >>> >>provide
> >>> >> >>me
> >>> >> >> >>know
> >>> >> >> >> >some guidance on better scenario to test this.
> >>> >> >> >> >
> >>> >> >> >> >Cheers,
> >>> >> >> >> >Sanjaya
> >>> >> >> >> >
> >>> >> >> >> >
> >>> >> >> >> >
> >>> >> >> >> >
> >>> >> >> >> >On Mon, Jun 3, 2013 at 8:17 PM, Mattmann, Chris A (398J) <
> >>> >> >> >> >chris.a.mattmann@jpl.nasa.gov> wrote:
> >>> >> >> >> >
> >>> >> >> >> >> Hi Sanjaya,
> >>> >> >> >> >>
> >>> >> >> >> >> -----Original Message-----
> >>> >> >> >> >>
> >>> >> >> >> >> From: Sanjaya Medonsa <sa...@gmail.com>
> >>> >> >> >> >> Reply-To: "dev@airavata.apache.org"
> >>><de...@airavata.apache.org>
> >>> >> >> >> >> Date: Thursday, May 30, 2013 5:12 AM
> >>> >> >> >> >> To: "dev@oodt.apache.org" <de...@oodt.apache.org>,
> >>> >> >> >> >>"dev@airavata.apache.org"
> >>> >> >> >> >> <de...@airavata.apache.org>
> >>> >> >> >> >> Subject: Apache Airavata-OODT Integration
> >>> >> >> >> >>
> >>> >> >> >> >> >Hi,
> >>> >> >> >> >> >     I have worked on the Apache Airavata integration with
> >>> >>Apache
> >>> >> >> >> >>OODT. As
> >>> >> >> >> >> >a first step, I have implemented integration with Apache
> >>>OODT
> >>> >> >>file
> >>> >> >> >> >> >manager component.
> >>> >> >> >> >>
> >>> >> >> >> >> Great work!!
> >>> >> >> >> >>
> >>> >> >> >> >> Comments below:
> >>> >> >> >> >>
> >>> >> >> >> >> >      1. Introduce a new GFac Schema type called
> >>>OODTProduct
> >>> >> >>which
> >>> >> >> >> >>takes
> >>> >> >> >> >> >APache OODT product IDs as input.
> >>> >> >> >> >> >      2. Implemented new pre GFac Handler by extending
> >>>Apache
> >>> >> >>OODT
> >>> >> >> >> >> >PgeTaskInstance to stage the corresponding file into the
> >>> >>working
> >>> >> >> >> >> >directory.
> >>> >> >> >> >> >      3. Once file is staged, input parameter with OODT
> >>> >>product
> >>> >> >>id
> >>> >> >> >>is
> >>> >> >> >> >> >replaced with path of the staged file for downstream
> >>> >>processing
> >>> >> >> >> >> >
> >>> >> >> >> >> >I have tested the implementation with Gfac application
> >>>which
> >>> >> >>wraps
> >>> >> >> >> >>/bin/ls
> >>> >> >> >> >> >command. Application takes product id as input and stage
> >>> >> >> >>corresponding
> >>> >> >> >> >> >file
> >>> >> >> >> >> >into the working directory and /bin/ls is executed against
> >>> the
> >>> >> >> >>staged
> >>> >> >> >> >> >file.
> >>> >> >> >> >> >Hope this is a valid testing scenario.
> >>> >> >> >> >> >
> >>> >> >> >> >> >Concerns
> >>> >> >> >> >> >- Configurations : I have added new configuration file
> >>>named
> >>> >>and
> >>> >> >> >> >> >oodt-integration.properties in addition to
> >>> >>dynamic_metadata.met
> >>> >> >>and
> >>> >> >> >> >> >pge-config.xml files used by OODT. But at the moment
> >>>there is
> >>> >>no
> >>> >> >> >>item
> >>> >> >> >> >> >configured with the oodt-integration.properties.
> >>> >> >> >> >>
> >>> >> >> >> >> You probably only need the pge-config.xml file. Dynamic
> >>> >>metadata,
> >>> >> >>and
> >>> >> >> >> >>the
> >>> >> >> >> >> task configuration properties can be specified
> >>> >>programmatically,
> >>> >> >> >>right?
> >>> >> >> >> >>
> >>> >> >> >> >> >- Staged File Name - With the current implementation of
> >>> >> >> >> >>PgeTaskInstance it
> >>> >> >> >> >> >is not possible to retrieve path of the staged file. Due
> >>>to
> >>> >>this
> >>> >> >> >> >> >limitation, I have query the FileManagerServer with
> >>>product
> >>> id
> >>> >> >>and
> >>> >> >> >> >> >retrieve
> >>> >> >> >> >> >the file name and computed the file path using
> >>>information of
> >>> >> >> >>working
> >>> >> >> >> >> >directory.
> >>> >> >> >> >>
> >>> >> >> >> >> I'm not sure I understand this? If you store and record the
> >>> >> >>Filename,
> >>> >> >> >> >>and
> >>> >> >> >> >> FileLocation
> >>> >> >> >> >> metadata files, then you can easily retrieve the staged
> >>>file
> >>> >>path
> >>> >> >> >>via a
> >>> >> >> >> >> SQLquery
> >>> >> >> >> >> via CAS-PGE by simply setting the
> >>> >> >>FORMAT=('$FileLocation/$Filename')
> >>> >> >> >>in
> >>> >> >> >> >> the response.
> >>> >> >> >> >> Can you comment on this?
> >>> >> >> >> >>
> >>> >> >> >> >> >- Currently it is not possible to execute the workflow
> >>>using
> >>> >> >>Xbaya
> >>> >> >> >>due
> >>> >> >> >> >>to
> >>> >> >> >> >> >validation failure due to new schema type. I have
> >>>commented
> >>> >>out
> >>> >> >>the
> >>> >> >> >> >> >relevant validation code for testing purpose.
> >>> >> >> >> >>
> >>> >> >> >> >> OK, will probably need to work on this.
> >>> >> >> >> >>
> >>> >> >> >> >> >
> >>> >> >> >> >> >Currently I am having an issue with review board client
> >>>tool
> >>> >>and
> >>> >> >> >>need
> >>> >> >> >> >>to
> >>> >> >> >> >> >resolve it to upload the code for review.
> >>> >> >> >> >>
> >>> >> >> >> >> I see later that you got this working, so will head over
> >>>and
> >>> >> >>review
> >>> >> >> >>that
> >>> >> >> >> >> now.
> >>> >> >> >> >>
> >>> >> >> >> >> Thanks!
> >>> >> >> >> >>
> >>> >> >> >> >> Cheers,
> >>> >> >> >> >> Chris
> >>> >> >> >> >>
> >>> >> >> >> >>
> >>> >>++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> >>> >> >> >> >> Chris Mattmann, Ph.D.
> >>> >> >> >> >> Senior Computer Scientist
> >>> >> >> >> >> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
> >>> >> >> >> >> Office: 171-266B, Mailstop: 171-246
> >>> >> >> >> >> Email: chris.a.mattmann@nasa.gov
> >>> >> >> >> >> WWW:  http://sunset.usc.edu/~mattmann/
> >>> >> >> >> >>
> >>> >>++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> >>> >> >> >> >> Adjunct Assistant Professor, Computer Science Department
> >>> >> >> >> >> University of Southern California, Los Angeles, CA 90089
> >>>USA
> >>> >> >> >> >>
> >>> >>++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> >>> >> >> >> >>
> >>> >> >> >> >>
> >>> >> >> >> >>
> >>> >> >> >> >>
> >>> >> >> >>
> >>> >> >> >>
> >>> >> >>
> >>> >> >>
> >>> >>
> >>> >>
> >>>
> >>>
> >>
>
>

Re: Apache Airavata-OODT Integration

Posted by Sanjaya Medonsa <sa...@gmail.com>.
Hi Chris,
    If some distributed file system is used to make the file path local, I
guess LocalDataTransferer should be used instead of RemoteDataTransferer.
Though the data transfer mechanism is configurable there is an issue with
file staging. If LocalDataTransferer is used,  as per the current
implementation it throws NullPointerException. Problem is
that, FileManagerFileStager class creates a product for data transfer
without product structure. But retrieveProduct method of
the LocalDataTransferer always expect a product structure. This could be
avoided by extending the FileManagerFileStager class. I need to know
whether using LocalDataTransferer is the correct approach, if distributed
file system is used to make the file path local.

Cheers,
Sanjaya


On Mon, Jul 8, 2013 at 7:44 PM, Mattmann, Chris A (398J) <
chris.a.mattmann@jpl.nasa.gov> wrote:

> Hi Sanjaya,
>
> -----Original Message-----
>
> From: Sanjaya Medonsa <sa...@gmail.com>
> Reply-To: "dev@airavata.apache.org" <de...@airavata.apache.org>
> Date: Monday, July 8, 2013 12:09 AM
> To: Airavata Dev <de...@airavata.apache.org>
> Cc: "dev@oodt.apache.org" <de...@oodt.apache.org>
> Subject: Re: Apache Airavata-OODT Integration
>
> >Hi Chris,
> >     I have started looking at changing the current implementation to use
> >file Name instead of product id. As per the current PGETask wrapper
> >implementation, it takes two inputs (Product ID or file path at the remote
> >location. If filePath is used force staging should be set. But I am not
> >quite sure what it means by force staging).
>
> Force staging I believe controls whether or not the staged files are
> overwritten.
>
> > If I am to use the current
> >provisions in PGETaskWrapper, then remote file path (Not the file  name)
> >has to be given as input. I am not quite sure whether it is ideal to use
> >file path instead of file name.
>
> You can easily generate the file path (which does not have to be remote,
> in fact, if you think about it, it could easily be local and in Apache
> OODT,
> we typically ensure it's local by using distributed filesystems like HDFS
> or NFS or Gluster to make remote files appear local by pushing that portion
> down into the distributed filesystem which we think does a better job of
> data movement :) ). To generate the file path you can use CAS-PGE SQLQuery
> facility that will allow you to look up e.g., $FileLocation/$Filename based
> on met fields, which in turn you can then feed into the path.
>
>
> >If filename to use as input, then
> >FilesStager needs to be customized to  retrieve product references from
> >file name.
>
> See above for an alternative.
>
> >File manager client doesn't have a mechanism to retrieve product
> >by file name. But it has mechanism to retrieve product by product name. I
> >guess typically both are the same.
>
> Yeah, or the other easy mechanism is simply to issue a query, e.g., build
> yourself a Filename query and then query the FM Catalog.
>
> >One drawback of this approach is that it
> >doesn't support list of product names. The method getProductReferences
> >which returns list of products is based on back end implementation that is
> >based on product id, through actual input is product (Product with just
> >product name set is not possible to as input). Please let me know your
> >thoughts.
>
> See above.
>
> Cheers,
> Chris
>
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> Chris Mattmann, Ph.D.
> Senior Computer Scientist
> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
> Office: 171-266B, Mailstop: 171-246
> Email: chris.a.mattmann@nasa.gov
> WWW:  http://sunset.usc.edu/~mattmann/
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> Adjunct Assistant Professor, Computer Science Department
> University of Southern California, Los Angeles, CA 90089 USA
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>
>
>
> >
> >
> >
> >
> >On Mon, Jun 17, 2013 at 5:52 PM, Sanjaya Medonsa
> ><sa...@gmail.com>wrote:
> >
> >> Thanks Chris. I'll update the implementation to use file name instead of
> >> OODT product id.
> >>
> >> Cheers,
> >> Sanjaya
> >>
> >>
> >> On Sun, Jun 16, 2013 at 12:51 AM, Mattmann, Chris A (398J) <
> >> chris.a.mattmann@jpl.nasa.gov> wrote:
> >>
> >>> Hey Sanjaya, sure +1 use the Filename. It's not guaranteed to be
> >>>unique,
> >>> but you can easily just pop the first one off the top (latest) and take
> >>> that (since it's sorted by product received time). You may check out
> >>>the
> >>> pcs-core module and some of its internal classes like FileManagerUtils
> >>> to see some cool helper functions that could aid in this regard.
> >>>
> >>> Cheers,
> >>> Chris
> >>>
> >>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> >>> Chris Mattmann, Ph.D.
> >>> Senior Computer Scientist
> >>> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
> >>> Office: 171-266B, Mailstop: 171-246
> >>> Email: chris.a.mattmann@nasa.gov
> >>> WWW:  http://sunset.usc.edu/~mattmann/
> >>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> >>> Adjunct Assistant Professor, Computer Science Department
> >>> University of Southern California, Los Angeles, CA 90089 USA
> >>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>> -----Original Message-----
> >>> From: Sanjaya Medonsa <sa...@gmail.com>
> >>> Reply-To: "dev@airavata.apache.org" <de...@airavata.apache.org>
> >>> Date: Saturday, June 15, 2013 4:04 AM
> >>> To: Airavata Dev <de...@airavata.apache.org>
> >>> Subject: Re: Apache Airavata-OODT Integration
> >>>
> >>> >Thanks Chris for your help! Working directory is available in
> >>> >JobExecutionContext in Airavata and directory can easily be retrieved.
> >>> >Issue in my case is that, from XBaya GUI I take product id as input
> >>>not
> >>> >the
> >>> >file name. Internally file stager query the file manager using
> >>>product id
> >>> >to retrieve product reference and corresponding file name to stage the
> >>> >file
> >>> >into input dir. Since this product id to file name mapping happens
> >>> >internally during the file staging, my implementation don't have
> >>>access
> >>> to
> >>> >filename unless I query the file manager to retrieve the corresponding
> >>> >file
> >>> >name using product id.
> >>> >
> >>> >One of the major issue in my implementation seems that I use OODT
> >>>product
> >>> >id as input, not the file name. Should I change my implementation to
> >>>use
> >>> >file name instead of product id ?
> >>> >
> >>> >Best Regards,
> >>> >Sanjaya
> >>> >
> >>> >
> >>> >On Fri, Jun 14, 2013 at 8:51 PM, Mattmann, Chris A (398J) <
> >>> >chris.a.mattmann@jpl.nasa.gov> wrote:
> >>> >
> >>> >> Hey Sanjaya,
> >>> >>
> >>> >> Easy, see the attached PGEConfig.xml here:
> >>> >>
> >>> >> http://paste.apache.org/6OGW
> >>> >>
> >>> >> In that file:
> >>> >>
> >>> >> 1. We compute the staged file path by computing JobDir
> >>> >> 2. We create in the exe block a staged input dir
> >>> >> 3. We stage the files just using cps in the exeBlock (could have
> >>> >> just as easily used fileStager)
> >>> >> 4. We know that the file is [JobInputDir]/[Filename]
> >>> >>
> >>> >> HTH.
> >>> >>
> >>> >> Cheers,
> >>> >> Chris
> >>> >>
> >>> >> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> >>> >> Chris Mattmann, Ph.D.
> >>> >> Senior Computer Scientist
> >>> >> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
> >>> >> Office: 171-266B, Mailstop: 171-246
> >>> >> Email: chris.a.mattmann@nasa.gov
> >>> >> WWW:  http://sunset.usc.edu/~mattmann/
> >>> >> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> >>> >> Adjunct Assistant Professor, Computer Science Department
> >>> >> University of Southern California, Los Angeles, CA 90089 USA
> >>> >> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> >>> >>
> >>> >>
> >>> >>
> >>> >>
> >>> >>
> >>> >>
> >>> >> -----Original Message-----
> >>> >> From: Sanjaya Medonsa <sa...@gmail.com>
> >>> >> Reply-To: "dev@airavata.apache.org" <de...@airavata.apache.org>
> >>> >> Date: Friday, June 14, 2013 5:02 AM
> >>> >> To: Airavata Dev <de...@airavata.apache.org>
> >>> >> Subject: Re: Apache Airavata-OODT Integration
> >>> >>
> >>> >> >Thanks Chris for your input. I actually use the PGETaskInstance for
> >>> >>file
> >>> >> >staging with minimal additional code. But my issue issue not with
> >>>the
> >>> >>file
> >>> >> >staging. As per my current implementation, application inputs
> >>>product
> >>> >>id.
> >>> >> >Then using the capabilities in PGETaskInstance class, it does the
> >>>file
> >>> >> >staging. But my issue is that during the file staging product is
> >>> >>mapped to
> >>> >> >a file in specified working directory. I don't have a way to
> >>>retrieve
> >>> >>the
> >>> >> >staged file name, as it is not recorded in Metadata (For this
> >>>purpose,
> >>> >>I
> >>> >> >query the FileManager again to get the corresponding reference name
> >>> >>for a
> >>> >> >given product id). I need the staged file path, since I modify the
> >>> >>input
> >>> >> >product id into staged file path prior to actual workflow
> >>>invocation.
> >>> >> >Basically I am looking for some implementation where I can easily
> >>> >> >retrieve,
> >>> >> >staged file path for a given product id.
> >>> >> >
> >>> >> >Cheers,
> >>> >> >Sanjaya
> >>> >> >
> >>> >> >
> >>> >> >On Wed, Jun 12, 2013 at 10:04 PM, Mattmann, Chris A (398J) <
> >>> >> >chris.a.mattmann@jpl.nasa.gov> wrote:
> >>> >> >
> >>> >> >> Hi Sanjaya,
> >>> >> >>
> >>> >> >> -----Original Message-----
> >>> >> >>
> >>> >> >> From: Sanjaya Medonsa <sa...@gmail.com>
> >>> >> >> Reply-To: "dev@airavata.apache.org" <de...@airavata.apache.org>
> >>> >> >> Date: Monday, June 10, 2013 5:20 PM
> >>> >> >> To: "dev@airavata.apache.org" <de...@airavata.apache.org>
> >>> >> >> Cc: "dev@oodt.apache.org" <de...@oodt.apache.org>
> >>> >> >> Subject: Re: Apache Airavata-OODT Integration
> >>> >> >>
> >>> >> >> >Hi Chris,
> >>> >> >> >       On configuration, I have get rid of all the configuration
> >>> >>files,
> >>> >> >> >including pge-config.xml. All the required configurations are
> >>> >> >> >programmatically set.  Configurations such FileManagerServer URL
> >>> are
> >>> >> >> >configured in the airavata-server.properties file. I'll update
> >>>the
> >>> >> >>review
> >>> >> >> >request with modified details.
> >>> >> >>
> >>> >> >> Great work!
> >>> >> >>
> >>> >> >>
> >>> >> >> >       Still I am not quite clear on how to retrieve staged file
> >>> >>path
> >>> >> >> >properly. Currently I am using getStagedFilePath method
> >>> >> >> >in ApacheAiravataWorkFlowInstanceImpl to regenerate the staged
> >>>file
> >>> >> >>path.
> >>> >> >> >While I am going through the OODT code that I have seen method
> >>>in
> >>> >> >> >DataTransferer to notify FileManagerServer once transfer is
> >>> >>completed.
> >>> >> >>But
> >>> >> >> >I couldn't see the same for product retrieval.
> >>> >> >>
> >>> >> >> Example:
> >>> >> >>
> >>> >> >>
> >>> >>
> >>> >>
> >>>
> >>>
> http://svn.apache.org/repos/asf/oodt/trunk/pge/src/test/resources/pge-co
> >>>n
> >>> >> >>fi
> >>> >> >> g.xml
> >>> >> >>
> >>> >> >>
> >>> >> >> Review Board tickets:
> >>> >> >> https://reviews.apache.org/r/4746/
> >>> >> >>
> >>> >> >> https://reviews.apache.org/r/5382/
> >>> >> >>
> >>> >> >>
> >>> >> >> JIRA issue source (in OODT since 0.4):
> >>> >> >>   https://issues.apache.org/jira/browse/OODT-443
> >>> >> >>
> >>> >> >>
> >>> >> >> >       As you suggested I'll improve my workflow using Apache
> >>>Tika.
> >>> >>I'd
> >>> >> >> >like to continue this as an Parallal task. While modifying
> >>>staging
> >>> >> >> >implementation based on community feedback, currently I am
> >>>looking
> >>> >>at
> >>> >> >> >ingesting output back to OODT.
> >>> >> >>
> >>> >> >> See above for info on file staging. I would strongly encourage
> >>>you
> >>> >>not
> >>> >> >> to reimplement CAS-PGE in Airavata -- it's pretty functional and
> >>> >> >>expressive
> >>> >> >> anyways and I would work to figure out how to make Airavata
> >>>leverage
> >>> >> >> CAS-PGE.
> >>> >> >>
> >>> >> >> Cheers,
> >>> >> >> Chris
> >>> >> >>
> >>> >> >>
> >>>++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> >>> >> >> Chris Mattmann, Ph.D.
> >>> >> >> Senior Computer Scientist
> >>> >> >> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
> >>> >> >> Office: 171-266B, Mailstop: 171-246
> >>> >> >> Email: chris.a.mattmann@nasa.gov
> >>> >> >> WWW:  http://sunset.usc.edu/~mattmann/
> >>> >> >>
> >>>++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> >>> >> >> Adjunct Assistant Professor, Computer Science Department
> >>> >> >> University of Southern California, Los Angeles, CA 90089 USA
> >>> >> >>
> >>>++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> >>> >> >>
> >>> >> >>
> >>> >> >>
> >>> >> >> >
> >>> >> >> >
> >>> >> >> >
> >>> >> >> >On Wed, Jun 5, 2013 at 12:11 AM, Mattmann, Chris A (398J) <
> >>> >> >> >chris.a.mattmann@jpl.nasa.gov> wrote:
> >>> >> >> >
> >>> >> >> >> Hi Sanjaya,
> >>> >> >> >>
> >>> >> >> >> I think starting out with /bin/ls would be good, maybe like a
> >>> >>/bin/ls
> >>> >> >> >> workflow, and then for each file returned, maybe run Apache
> >>>Tika
> >>> >>and
> >>> >> >> >> extract its metadata and then pipe that to a file?
> >>> >> >> >>
> >>> >> >> >> How about that?
> >>> >> >> >>
> >>> >> >> >> Cheers,
> >>> >> >> >> Chris
> >>> >> >> >>
> >>> >> >> >>
> >>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> >>> >> >> >> Chris Mattmann, Ph.D.
> >>> >> >> >> Senior Computer Scientist
> >>> >> >> >> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
> >>> >> >> >> Office: 171-266B, Mailstop: 171-246
> >>> >> >> >> Email: chris.a.mattmann@nasa.gov
> >>> >> >> >> WWW:  http://sunset.usc.edu/~mattmann/
> >>> >> >> >>
> >>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> >>> >> >> >> Adjunct Assistant Professor, Computer Science Department
> >>> >> >> >> University of Southern California, Los Angeles, CA 90089 USA
> >>> >> >> >>
> >>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> >>> >> >> >>
> >>> >> >> >>
> >>> >> >> >>
> >>> >> >> >>
> >>> >> >> >>
> >>> >> >> >>
> >>> >> >> >> -----Original Message-----
> >>> >> >> >> From: Sanjaya Medonsa <sa...@gmail.com>
> >>> >> >> >> Reply-To: "dev@airavata.apache.org" <de...@airavata.apache.org>
> >>> >> >> >> Date: Tuesday, June 4, 2013 5:31 AM
> >>> >> >> >> To: "dev@airavata.apache.org" <de...@airavata.apache.org>
> >>> >> >> >> Cc: "dev@oodt.apache.org" <de...@oodt.apache.org>
> >>> >> >> >> Subject: Re: Apache Airavata-OODT Integration
> >>> >> >> >>
> >>> >> >> >> >Hi Chris,
> >>> >> >> >> >     Please see my comments below on the two items.
> >>> >> >> >> >
> >>> >> >> >> >Configuration : It should be possible to set them
> >>> >>programmatically.
> >>> >> >> >> >Actually I have implemented partly it for file staging
> >>> >>information.
> >>> >> >> >>I'll
> >>> >> >> >> >work to get rid of the other configuration files.
> >>> >> >> >> >
> >>> >> >> >> >Staged File Path : I'll work on the suggested approach,
> >>>though I
> >>> >>am
> >>> >> >>not
> >>> >> >> >> >fully understand it at the moment. I guess I need to go
> >>>through
> >>> >>bit
> >>> >> >> >>more
> >>> >> >> >> >on
> >>> >> >> >> >CAS-PGE and come back to you on the proposed approach.
> >>> >> >> >> >
> >>> >> >> >> >Currently I am testing this by wrapping /bin/ls command as
> >>>GFac
> >>> >> >> >>service. I
> >>> >> >> >> >may need to test this with real workflow. Could you please
> >>> >>provide
> >>> >> >>me
> >>> >> >> >>know
> >>> >> >> >> >some guidance on better scenario to test this.
> >>> >> >> >> >
> >>> >> >> >> >Cheers,
> >>> >> >> >> >Sanjaya
> >>> >> >> >> >
> >>> >> >> >> >
> >>> >> >> >> >
> >>> >> >> >> >
> >>> >> >> >> >On Mon, Jun 3, 2013 at 8:17 PM, Mattmann, Chris A (398J) <
> >>> >> >> >> >chris.a.mattmann@jpl.nasa.gov> wrote:
> >>> >> >> >> >
> >>> >> >> >> >> Hi Sanjaya,
> >>> >> >> >> >>
> >>> >> >> >> >> -----Original Message-----
> >>> >> >> >> >>
> >>> >> >> >> >> From: Sanjaya Medonsa <sa...@gmail.com>
> >>> >> >> >> >> Reply-To: "dev@airavata.apache.org"
> >>><de...@airavata.apache.org>
> >>> >> >> >> >> Date: Thursday, May 30, 2013 5:12 AM
> >>> >> >> >> >> To: "dev@oodt.apache.org" <de...@oodt.apache.org>,
> >>> >> >> >> >>"dev@airavata.apache.org"
> >>> >> >> >> >> <de...@airavata.apache.org>
> >>> >> >> >> >> Subject: Apache Airavata-OODT Integration
> >>> >> >> >> >>
> >>> >> >> >> >> >Hi,
> >>> >> >> >> >> >     I have worked on the Apache Airavata integration with
> >>> >>Apache
> >>> >> >> >> >>OODT. As
> >>> >> >> >> >> >a first step, I have implemented integration with Apache
> >>>OODT
> >>> >> >>file
> >>> >> >> >> >> >manager component.
> >>> >> >> >> >>
> >>> >> >> >> >> Great work!!
> >>> >> >> >> >>
> >>> >> >> >> >> Comments below:
> >>> >> >> >> >>
> >>> >> >> >> >> >      1. Introduce a new GFac Schema type called
> >>>OODTProduct
> >>> >> >>which
> >>> >> >> >> >>takes
> >>> >> >> >> >> >APache OODT product IDs as input.
> >>> >> >> >> >> >      2. Implemented new pre GFac Handler by extending
> >>>Apache
> >>> >> >>OODT
> >>> >> >> >> >> >PgeTaskInstance to stage the corresponding file into the
> >>> >>working
> >>> >> >> >> >> >directory.
> >>> >> >> >> >> >      3. Once file is staged, input parameter with OODT
> >>> >>product
> >>> >> >>id
> >>> >> >> >>is
> >>> >> >> >> >> >replaced with path of the staged file for downstream
> >>> >>processing
> >>> >> >> >> >> >
> >>> >> >> >> >> >I have tested the implementation with Gfac application
> >>>which
> >>> >> >>wraps
> >>> >> >> >> >>/bin/ls
> >>> >> >> >> >> >command. Application takes product id as input and stage
> >>> >> >> >>corresponding
> >>> >> >> >> >> >file
> >>> >> >> >> >> >into the working directory and /bin/ls is executed against
> >>> the
> >>> >> >> >>staged
> >>> >> >> >> >> >file.
> >>> >> >> >> >> >Hope this is a valid testing scenario.
> >>> >> >> >> >> >
> >>> >> >> >> >> >Concerns
> >>> >> >> >> >> >- Configurations : I have added new configuration file
> >>>named
> >>> >>and
> >>> >> >> >> >> >oodt-integration.properties in addition to
> >>> >>dynamic_metadata.met
> >>> >> >>and
> >>> >> >> >> >> >pge-config.xml files used by OODT. But at the moment
> >>>there is
> >>> >>no
> >>> >> >> >>item
> >>> >> >> >> >> >configured with the oodt-integration.properties.
> >>> >> >> >> >>
> >>> >> >> >> >> You probably only need the pge-config.xml file. Dynamic
> >>> >>metadata,
> >>> >> >>and
> >>> >> >> >> >>the
> >>> >> >> >> >> task configuration properties can be specified
> >>> >>programmatically,
> >>> >> >> >>right?
> >>> >> >> >> >>
> >>> >> >> >> >> >- Staged File Name - With the current implementation of
> >>> >> >> >> >>PgeTaskInstance it
> >>> >> >> >> >> >is not possible to retrieve path of the staged file. Due
> >>>to
> >>> >>this
> >>> >> >> >> >> >limitation, I have query the FileManagerServer with
> >>>product
> >>> id
> >>> >> >>and
> >>> >> >> >> >> >retrieve
> >>> >> >> >> >> >the file name and computed the file path using
> >>>information of
> >>> >> >> >>working
> >>> >> >> >> >> >directory.
> >>> >> >> >> >>
> >>> >> >> >> >> I'm not sure I understand this? If you store and record the
> >>> >> >>Filename,
> >>> >> >> >> >>and
> >>> >> >> >> >> FileLocation
> >>> >> >> >> >> metadata files, then you can easily retrieve the staged
> >>>file
> >>> >>path
> >>> >> >> >>via a
> >>> >> >> >> >> SQLquery
> >>> >> >> >> >> via CAS-PGE by simply setting the
> >>> >> >>FORMAT=('$FileLocation/$Filename')
> >>> >> >> >>in
> >>> >> >> >> >> the response.
> >>> >> >> >> >> Can you comment on this?
> >>> >> >> >> >>
> >>> >> >> >> >> >- Currently it is not possible to execute the workflow
> >>>using
> >>> >> >>Xbaya
> >>> >> >> >>due
> >>> >> >> >> >>to
> >>> >> >> >> >> >validation failure due to new schema type. I have
> >>>commented
> >>> >>out
> >>> >> >>the
> >>> >> >> >> >> >relevant validation code for testing purpose.
> >>> >> >> >> >>
> >>> >> >> >> >> OK, will probably need to work on this.
> >>> >> >> >> >>
> >>> >> >> >> >> >
> >>> >> >> >> >> >Currently I am having an issue with review board client
> >>>tool
> >>> >>and
> >>> >> >> >>need
> >>> >> >> >> >>to
> >>> >> >> >> >> >resolve it to upload the code for review.
> >>> >> >> >> >>
> >>> >> >> >> >> I see later that you got this working, so will head over
> >>>and
> >>> >> >>review
> >>> >> >> >>that
> >>> >> >> >> >> now.
> >>> >> >> >> >>
> >>> >> >> >> >> Thanks!
> >>> >> >> >> >>
> >>> >> >> >> >> Cheers,
> >>> >> >> >> >> Chris
> >>> >> >> >> >>
> >>> >> >> >> >>
> >>> >>++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> >>> >> >> >> >> Chris Mattmann, Ph.D.
> >>> >> >> >> >> Senior Computer Scientist
> >>> >> >> >> >> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
> >>> >> >> >> >> Office: 171-266B, Mailstop: 171-246
> >>> >> >> >> >> Email: chris.a.mattmann@nasa.gov
> >>> >> >> >> >> WWW:  http://sunset.usc.edu/~mattmann/
> >>> >> >> >> >>
> >>> >>++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> >>> >> >> >> >> Adjunct Assistant Professor, Computer Science Department
> >>> >> >> >> >> University of Southern California, Los Angeles, CA 90089
> >>>USA
> >>> >> >> >> >>
> >>> >>++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> >>> >> >> >> >>
> >>> >> >> >> >>
> >>> >> >> >> >>
> >>> >> >> >> >>
> >>> >> >> >>
> >>> >> >> >>
> >>> >> >>
> >>> >> >>
> >>> >>
> >>> >>
> >>>
> >>>
> >>
>
>

Re: Apache Airavata-OODT Integration

Posted by "Mattmann, Chris A (398J)" <ch...@jpl.nasa.gov>.
Hi Sanjaya,

-----Original Message-----

From: Sanjaya Medonsa <sa...@gmail.com>
Reply-To: "dev@airavata.apache.org" <de...@airavata.apache.org>
Date: Monday, July 8, 2013 12:09 AM
To: Airavata Dev <de...@airavata.apache.org>
Cc: "dev@oodt.apache.org" <de...@oodt.apache.org>
Subject: Re: Apache Airavata-OODT Integration

>Hi Chris,
>     I have started looking at changing the current implementation to use
>file Name instead of product id. As per the current PGETask wrapper
>implementation, it takes two inputs (Product ID or file path at the remote
>location. If filePath is used force staging should be set. But I am not
>quite sure what it means by force staging).

Force staging I believe controls whether or not the staged files are
overwritten.

> If I am to use the current
>provisions in PGETaskWrapper, then remote file path (Not the file  name)
>has to be given as input. I am not quite sure whether it is ideal to use
>file path instead of file name.

You can easily generate the file path (which does not have to be remote,
in fact, if you think about it, it could easily be local and in Apache
OODT,
we typically ensure it's local by using distributed filesystems like HDFS
or NFS or Gluster to make remote files appear local by pushing that portion
down into the distributed filesystem which we think does a better job of
data movement :) ). To generate the file path you can use CAS-PGE SQLQuery
facility that will allow you to look up e.g., $FileLocation/$Filename based
on met fields, which in turn you can then feed into the path.


>If filename to use as input, then
>FilesStager needs to be customized to  retrieve product references from
>file name. 

See above for an alternative.

>File manager client doesn't have a mechanism to retrieve product
>by file name. But it has mechanism to retrieve product by product name. I
>guess typically both are the same.

Yeah, or the other easy mechanism is simply to issue a query, e.g., build
yourself a Filename query and then query the FM Catalog.

>One drawback of this approach is that it
>doesn't support list of product names. The method getProductReferences
>which returns list of products is based on back end implementation that is
>based on product id, through actual input is product (Product with just
>product name set is not possible to as input). Please let me know your
>thoughts.

See above.

Cheers,
Chris

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: chris.a.mattmann@nasa.gov
WWW:  http://sunset.usc.edu/~mattmann/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Adjunct Assistant Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++



>
>
>
>
>On Mon, Jun 17, 2013 at 5:52 PM, Sanjaya Medonsa
><sa...@gmail.com>wrote:
>
>> Thanks Chris. I'll update the implementation to use file name instead of
>> OODT product id.
>>
>> Cheers,
>> Sanjaya
>>
>>
>> On Sun, Jun 16, 2013 at 12:51 AM, Mattmann, Chris A (398J) <
>> chris.a.mattmann@jpl.nasa.gov> wrote:
>>
>>> Hey Sanjaya, sure +1 use the Filename. It's not guaranteed to be
>>>unique,
>>> but you can easily just pop the first one off the top (latest) and take
>>> that (since it's sorted by product received time). You may check out
>>>the
>>> pcs-core module and some of its internal classes like FileManagerUtils
>>> to see some cool helper functions that could aid in this regard.
>>>
>>> Cheers,
>>> Chris
>>>
>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>> Chris Mattmann, Ph.D.
>>> Senior Computer Scientist
>>> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
>>> Office: 171-266B, Mailstop: 171-246
>>> Email: chris.a.mattmann@nasa.gov
>>> WWW:  http://sunset.usc.edu/~mattmann/
>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>> Adjunct Assistant Professor, Computer Science Department
>>> University of Southern California, Los Angeles, CA 90089 USA
>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>
>>>
>>>
>>>
>>>
>>>
>>> -----Original Message-----
>>> From: Sanjaya Medonsa <sa...@gmail.com>
>>> Reply-To: "dev@airavata.apache.org" <de...@airavata.apache.org>
>>> Date: Saturday, June 15, 2013 4:04 AM
>>> To: Airavata Dev <de...@airavata.apache.org>
>>> Subject: Re: Apache Airavata-OODT Integration
>>>
>>> >Thanks Chris for your help! Working directory is available in
>>> >JobExecutionContext in Airavata and directory can easily be retrieved.
>>> >Issue in my case is that, from XBaya GUI I take product id as input
>>>not
>>> >the
>>> >file name. Internally file stager query the file manager using
>>>product id
>>> >to retrieve product reference and corresponding file name to stage the
>>> >file
>>> >into input dir. Since this product id to file name mapping happens
>>> >internally during the file staging, my implementation don't have
>>>access
>>> to
>>> >filename unless I query the file manager to retrieve the corresponding
>>> >file
>>> >name using product id.
>>> >
>>> >One of the major issue in my implementation seems that I use OODT
>>>product
>>> >id as input, not the file name. Should I change my implementation to
>>>use
>>> >file name instead of product id ?
>>> >
>>> >Best Regards,
>>> >Sanjaya
>>> >
>>> >
>>> >On Fri, Jun 14, 2013 at 8:51 PM, Mattmann, Chris A (398J) <
>>> >chris.a.mattmann@jpl.nasa.gov> wrote:
>>> >
>>> >> Hey Sanjaya,
>>> >>
>>> >> Easy, see the attached PGEConfig.xml here:
>>> >>
>>> >> http://paste.apache.org/6OGW
>>> >>
>>> >> In that file:
>>> >>
>>> >> 1. We compute the staged file path by computing JobDir
>>> >> 2. We create in the exe block a staged input dir
>>> >> 3. We stage the files just using cps in the exeBlock (could have
>>> >> just as easily used fileStager)
>>> >> 4. We know that the file is [JobInputDir]/[Filename]
>>> >>
>>> >> HTH.
>>> >>
>>> >> Cheers,
>>> >> Chris
>>> >>
>>> >> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>> >> Chris Mattmann, Ph.D.
>>> >> Senior Computer Scientist
>>> >> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
>>> >> Office: 171-266B, Mailstop: 171-246
>>> >> Email: chris.a.mattmann@nasa.gov
>>> >> WWW:  http://sunset.usc.edu/~mattmann/
>>> >> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>> >> Adjunct Assistant Professor, Computer Science Department
>>> >> University of Southern California, Los Angeles, CA 90089 USA
>>> >> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>> >>
>>> >>
>>> >>
>>> >>
>>> >>
>>> >>
>>> >> -----Original Message-----
>>> >> From: Sanjaya Medonsa <sa...@gmail.com>
>>> >> Reply-To: "dev@airavata.apache.org" <de...@airavata.apache.org>
>>> >> Date: Friday, June 14, 2013 5:02 AM
>>> >> To: Airavata Dev <de...@airavata.apache.org>
>>> >> Subject: Re: Apache Airavata-OODT Integration
>>> >>
>>> >> >Thanks Chris for your input. I actually use the PGETaskInstance for
>>> >>file
>>> >> >staging with minimal additional code. But my issue issue not with
>>>the
>>> >>file
>>> >> >staging. As per my current implementation, application inputs
>>>product
>>> >>id.
>>> >> >Then using the capabilities in PGETaskInstance class, it does the
>>>file
>>> >> >staging. But my issue is that during the file staging product is
>>> >>mapped to
>>> >> >a file in specified working directory. I don't have a way to
>>>retrieve
>>> >>the
>>> >> >staged file name, as it is not recorded in Metadata (For this
>>>purpose,
>>> >>I
>>> >> >query the FileManager again to get the corresponding reference name
>>> >>for a
>>> >> >given product id). I need the staged file path, since I modify the
>>> >>input
>>> >> >product id into staged file path prior to actual workflow
>>>invocation.
>>> >> >Basically I am looking for some implementation where I can easily
>>> >> >retrieve,
>>> >> >staged file path for a given product id.
>>> >> >
>>> >> >Cheers,
>>> >> >Sanjaya
>>> >> >
>>> >> >
>>> >> >On Wed, Jun 12, 2013 at 10:04 PM, Mattmann, Chris A (398J) <
>>> >> >chris.a.mattmann@jpl.nasa.gov> wrote:
>>> >> >
>>> >> >> Hi Sanjaya,
>>> >> >>
>>> >> >> -----Original Message-----
>>> >> >>
>>> >> >> From: Sanjaya Medonsa <sa...@gmail.com>
>>> >> >> Reply-To: "dev@airavata.apache.org" <de...@airavata.apache.org>
>>> >> >> Date: Monday, June 10, 2013 5:20 PM
>>> >> >> To: "dev@airavata.apache.org" <de...@airavata.apache.org>
>>> >> >> Cc: "dev@oodt.apache.org" <de...@oodt.apache.org>
>>> >> >> Subject: Re: Apache Airavata-OODT Integration
>>> >> >>
>>> >> >> >Hi Chris,
>>> >> >> >       On configuration, I have get rid of all the configuration
>>> >>files,
>>> >> >> >including pge-config.xml. All the required configurations are
>>> >> >> >programmatically set.  Configurations such FileManagerServer URL
>>> are
>>> >> >> >configured in the airavata-server.properties file. I'll update
>>>the
>>> >> >>review
>>> >> >> >request with modified details.
>>> >> >>
>>> >> >> Great work!
>>> >> >>
>>> >> >>
>>> >> >> >       Still I am not quite clear on how to retrieve staged file
>>> >>path
>>> >> >> >properly. Currently I am using getStagedFilePath method
>>> >> >> >in ApacheAiravataWorkFlowInstanceImpl to regenerate the staged
>>>file
>>> >> >>path.
>>> >> >> >While I am going through the OODT code that I have seen method
>>>in
>>> >> >> >DataTransferer to notify FileManagerServer once transfer is
>>> >>completed.
>>> >> >>But
>>> >> >> >I couldn't see the same for product retrieval.
>>> >> >>
>>> >> >> Example:
>>> >> >>
>>> >> >>
>>> >>
>>> >>
>>> 
>>>http://svn.apache.org/repos/asf/oodt/trunk/pge/src/test/resources/pge-co
>>>n
>>> >> >>fi
>>> >> >> g.xml
>>> >> >>
>>> >> >>
>>> >> >> Review Board tickets:
>>> >> >> https://reviews.apache.org/r/4746/
>>> >> >>
>>> >> >> https://reviews.apache.org/r/5382/
>>> >> >>
>>> >> >>
>>> >> >> JIRA issue source (in OODT since 0.4):
>>> >> >>   https://issues.apache.org/jira/browse/OODT-443
>>> >> >>
>>> >> >>
>>> >> >> >       As you suggested I'll improve my workflow using Apache
>>>Tika.
>>> >>I'd
>>> >> >> >like to continue this as an Parallal task. While modifying
>>>staging
>>> >> >> >implementation based on community feedback, currently I am
>>>looking
>>> >>at
>>> >> >> >ingesting output back to OODT.
>>> >> >>
>>> >> >> See above for info on file staging. I would strongly encourage
>>>you
>>> >>not
>>> >> >> to reimplement CAS-PGE in Airavata -- it's pretty functional and
>>> >> >>expressive
>>> >> >> anyways and I would work to figure out how to make Airavata
>>>leverage
>>> >> >> CAS-PGE.
>>> >> >>
>>> >> >> Cheers,
>>> >> >> Chris
>>> >> >>
>>> >> >> 
>>>++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>> >> >> Chris Mattmann, Ph.D.
>>> >> >> Senior Computer Scientist
>>> >> >> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
>>> >> >> Office: 171-266B, Mailstop: 171-246
>>> >> >> Email: chris.a.mattmann@nasa.gov
>>> >> >> WWW:  http://sunset.usc.edu/~mattmann/
>>> >> >> 
>>>++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>> >> >> Adjunct Assistant Professor, Computer Science Department
>>> >> >> University of Southern California, Los Angeles, CA 90089 USA
>>> >> >> 
>>>++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>> >> >>
>>> >> >>
>>> >> >>
>>> >> >> >
>>> >> >> >
>>> >> >> >
>>> >> >> >On Wed, Jun 5, 2013 at 12:11 AM, Mattmann, Chris A (398J) <
>>> >> >> >chris.a.mattmann@jpl.nasa.gov> wrote:
>>> >> >> >
>>> >> >> >> Hi Sanjaya,
>>> >> >> >>
>>> >> >> >> I think starting out with /bin/ls would be good, maybe like a
>>> >>/bin/ls
>>> >> >> >> workflow, and then for each file returned, maybe run Apache
>>>Tika
>>> >>and
>>> >> >> >> extract its metadata and then pipe that to a file?
>>> >> >> >>
>>> >> >> >> How about that?
>>> >> >> >>
>>> >> >> >> Cheers,
>>> >> >> >> Chris
>>> >> >> >>
>>> >> >> >>
>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>> >> >> >> Chris Mattmann, Ph.D.
>>> >> >> >> Senior Computer Scientist
>>> >> >> >> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
>>> >> >> >> Office: 171-266B, Mailstop: 171-246
>>> >> >> >> Email: chris.a.mattmann@nasa.gov
>>> >> >> >> WWW:  http://sunset.usc.edu/~mattmann/
>>> >> >> >>
>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>> >> >> >> Adjunct Assistant Professor, Computer Science Department
>>> >> >> >> University of Southern California, Los Angeles, CA 90089 USA
>>> >> >> >>
>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>> >> >> >>
>>> >> >> >>
>>> >> >> >>
>>> >> >> >>
>>> >> >> >>
>>> >> >> >>
>>> >> >> >> -----Original Message-----
>>> >> >> >> From: Sanjaya Medonsa <sa...@gmail.com>
>>> >> >> >> Reply-To: "dev@airavata.apache.org" <de...@airavata.apache.org>
>>> >> >> >> Date: Tuesday, June 4, 2013 5:31 AM
>>> >> >> >> To: "dev@airavata.apache.org" <de...@airavata.apache.org>
>>> >> >> >> Cc: "dev@oodt.apache.org" <de...@oodt.apache.org>
>>> >> >> >> Subject: Re: Apache Airavata-OODT Integration
>>> >> >> >>
>>> >> >> >> >Hi Chris,
>>> >> >> >> >     Please see my comments below on the two items.
>>> >> >> >> >
>>> >> >> >> >Configuration : It should be possible to set them
>>> >>programmatically.
>>> >> >> >> >Actually I have implemented partly it for file staging
>>> >>information.
>>> >> >> >>I'll
>>> >> >> >> >work to get rid of the other configuration files.
>>> >> >> >> >
>>> >> >> >> >Staged File Path : I'll work on the suggested approach,
>>>though I
>>> >>am
>>> >> >>not
>>> >> >> >> >fully understand it at the moment. I guess I need to go
>>>through
>>> >>bit
>>> >> >> >>more
>>> >> >> >> >on
>>> >> >> >> >CAS-PGE and come back to you on the proposed approach.
>>> >> >> >> >
>>> >> >> >> >Currently I am testing this by wrapping /bin/ls command as
>>>GFac
>>> >> >> >>service. I
>>> >> >> >> >may need to test this with real workflow. Could you please
>>> >>provide
>>> >> >>me
>>> >> >> >>know
>>> >> >> >> >some guidance on better scenario to test this.
>>> >> >> >> >
>>> >> >> >> >Cheers,
>>> >> >> >> >Sanjaya
>>> >> >> >> >
>>> >> >> >> >
>>> >> >> >> >
>>> >> >> >> >
>>> >> >> >> >On Mon, Jun 3, 2013 at 8:17 PM, Mattmann, Chris A (398J) <
>>> >> >> >> >chris.a.mattmann@jpl.nasa.gov> wrote:
>>> >> >> >> >
>>> >> >> >> >> Hi Sanjaya,
>>> >> >> >> >>
>>> >> >> >> >> -----Original Message-----
>>> >> >> >> >>
>>> >> >> >> >> From: Sanjaya Medonsa <sa...@gmail.com>
>>> >> >> >> >> Reply-To: "dev@airavata.apache.org"
>>><de...@airavata.apache.org>
>>> >> >> >> >> Date: Thursday, May 30, 2013 5:12 AM
>>> >> >> >> >> To: "dev@oodt.apache.org" <de...@oodt.apache.org>,
>>> >> >> >> >>"dev@airavata.apache.org"
>>> >> >> >> >> <de...@airavata.apache.org>
>>> >> >> >> >> Subject: Apache Airavata-OODT Integration
>>> >> >> >> >>
>>> >> >> >> >> >Hi,
>>> >> >> >> >> >     I have worked on the Apache Airavata integration with
>>> >>Apache
>>> >> >> >> >>OODT. As
>>> >> >> >> >> >a first step, I have implemented integration with Apache
>>>OODT
>>> >> >>file
>>> >> >> >> >> >manager component.
>>> >> >> >> >>
>>> >> >> >> >> Great work!!
>>> >> >> >> >>
>>> >> >> >> >> Comments below:
>>> >> >> >> >>
>>> >> >> >> >> >      1. Introduce a new GFac Schema type called
>>>OODTProduct
>>> >> >>which
>>> >> >> >> >>takes
>>> >> >> >> >> >APache OODT product IDs as input.
>>> >> >> >> >> >      2. Implemented new pre GFac Handler by extending
>>>Apache
>>> >> >>OODT
>>> >> >> >> >> >PgeTaskInstance to stage the corresponding file into the
>>> >>working
>>> >> >> >> >> >directory.
>>> >> >> >> >> >      3. Once file is staged, input parameter with OODT
>>> >>product
>>> >> >>id
>>> >> >> >>is
>>> >> >> >> >> >replaced with path of the staged file for downstream
>>> >>processing
>>> >> >> >> >> >
>>> >> >> >> >> >I have tested the implementation with Gfac application
>>>which
>>> >> >>wraps
>>> >> >> >> >>/bin/ls
>>> >> >> >> >> >command. Application takes product id as input and stage
>>> >> >> >>corresponding
>>> >> >> >> >> >file
>>> >> >> >> >> >into the working directory and /bin/ls is executed against
>>> the
>>> >> >> >>staged
>>> >> >> >> >> >file.
>>> >> >> >> >> >Hope this is a valid testing scenario.
>>> >> >> >> >> >
>>> >> >> >> >> >Concerns
>>> >> >> >> >> >- Configurations : I have added new configuration file
>>>named
>>> >>and
>>> >> >> >> >> >oodt-integration.properties in addition to
>>> >>dynamic_metadata.met
>>> >> >>and
>>> >> >> >> >> >pge-config.xml files used by OODT. But at the moment
>>>there is
>>> >>no
>>> >> >> >>item
>>> >> >> >> >> >configured with the oodt-integration.properties.
>>> >> >> >> >>
>>> >> >> >> >> You probably only need the pge-config.xml file. Dynamic
>>> >>metadata,
>>> >> >>and
>>> >> >> >> >>the
>>> >> >> >> >> task configuration properties can be specified
>>> >>programmatically,
>>> >> >> >>right?
>>> >> >> >> >>
>>> >> >> >> >> >- Staged File Name - With the current implementation of
>>> >> >> >> >>PgeTaskInstance it
>>> >> >> >> >> >is not possible to retrieve path of the staged file. Due
>>>to
>>> >>this
>>> >> >> >> >> >limitation, I have query the FileManagerServer with
>>>product
>>> id
>>> >> >>and
>>> >> >> >> >> >retrieve
>>> >> >> >> >> >the file name and computed the file path using
>>>information of
>>> >> >> >>working
>>> >> >> >> >> >directory.
>>> >> >> >> >>
>>> >> >> >> >> I'm not sure I understand this? If you store and record the
>>> >> >>Filename,
>>> >> >> >> >>and
>>> >> >> >> >> FileLocation
>>> >> >> >> >> metadata files, then you can easily retrieve the staged
>>>file
>>> >>path
>>> >> >> >>via a
>>> >> >> >> >> SQLquery
>>> >> >> >> >> via CAS-PGE by simply setting the
>>> >> >>FORMAT=('$FileLocation/$Filename')
>>> >> >> >>in
>>> >> >> >> >> the response.
>>> >> >> >> >> Can you comment on this?
>>> >> >> >> >>
>>> >> >> >> >> >- Currently it is not possible to execute the workflow
>>>using
>>> >> >>Xbaya
>>> >> >> >>due
>>> >> >> >> >>to
>>> >> >> >> >> >validation failure due to new schema type. I have
>>>commented
>>> >>out
>>> >> >>the
>>> >> >> >> >> >relevant validation code for testing purpose.
>>> >> >> >> >>
>>> >> >> >> >> OK, will probably need to work on this.
>>> >> >> >> >>
>>> >> >> >> >> >
>>> >> >> >> >> >Currently I am having an issue with review board client
>>>tool
>>> >>and
>>> >> >> >>need
>>> >> >> >> >>to
>>> >> >> >> >> >resolve it to upload the code for review.
>>> >> >> >> >>
>>> >> >> >> >> I see later that you got this working, so will head over
>>>and
>>> >> >>review
>>> >> >> >>that
>>> >> >> >> >> now.
>>> >> >> >> >>
>>> >> >> >> >> Thanks!
>>> >> >> >> >>
>>> >> >> >> >> Cheers,
>>> >> >> >> >> Chris
>>> >> >> >> >>
>>> >> >> >> >>
>>> >>++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>> >> >> >> >> Chris Mattmann, Ph.D.
>>> >> >> >> >> Senior Computer Scientist
>>> >> >> >> >> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
>>> >> >> >> >> Office: 171-266B, Mailstop: 171-246
>>> >> >> >> >> Email: chris.a.mattmann@nasa.gov
>>> >> >> >> >> WWW:  http://sunset.usc.edu/~mattmann/
>>> >> >> >> >>
>>> >>++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>> >> >> >> >> Adjunct Assistant Professor, Computer Science Department
>>> >> >> >> >> University of Southern California, Los Angeles, CA 90089
>>>USA
>>> >> >> >> >>
>>> >>++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>> >> >> >> >>
>>> >> >> >> >>
>>> >> >> >> >>
>>> >> >> >> >>
>>> >> >> >>
>>> >> >> >>
>>> >> >>
>>> >> >>
>>> >>
>>> >>
>>>
>>>
>>


Re: Apache Airavata-OODT Integration

Posted by "Mattmann, Chris A (398J)" <ch...@jpl.nasa.gov>.
Hi Sanjaya,

-----Original Message-----

From: Sanjaya Medonsa <sa...@gmail.com>
Reply-To: "dev@airavata.apache.org" <de...@airavata.apache.org>
Date: Monday, July 8, 2013 12:09 AM
To: Airavata Dev <de...@airavata.apache.org>
Cc: "dev@oodt.apache.org" <de...@oodt.apache.org>
Subject: Re: Apache Airavata-OODT Integration

>Hi Chris,
>     I have started looking at changing the current implementation to use
>file Name instead of product id. As per the current PGETask wrapper
>implementation, it takes two inputs (Product ID or file path at the remote
>location. If filePath is used force staging should be set. But I am not
>quite sure what it means by force staging).

Force staging I believe controls whether or not the staged files are
overwritten.

> If I am to use the current
>provisions in PGETaskWrapper, then remote file path (Not the file  name)
>has to be given as input. I am not quite sure whether it is ideal to use
>file path instead of file name.

You can easily generate the file path (which does not have to be remote,
in fact, if you think about it, it could easily be local and in Apache
OODT,
we typically ensure it's local by using distributed filesystems like HDFS
or NFS or Gluster to make remote files appear local by pushing that portion
down into the distributed filesystem which we think does a better job of
data movement :) ). To generate the file path you can use CAS-PGE SQLQuery
facility that will allow you to look up e.g., $FileLocation/$Filename based
on met fields, which in turn you can then feed into the path.


>If filename to use as input, then
>FilesStager needs to be customized to  retrieve product references from
>file name. 

See above for an alternative.

>File manager client doesn't have a mechanism to retrieve product
>by file name. But it has mechanism to retrieve product by product name. I
>guess typically both are the same.

Yeah, or the other easy mechanism is simply to issue a query, e.g., build
yourself a Filename query and then query the FM Catalog.

>One drawback of this approach is that it
>doesn't support list of product names. The method getProductReferences
>which returns list of products is based on back end implementation that is
>based on product id, through actual input is product (Product with just
>product name set is not possible to as input). Please let me know your
>thoughts.

See above.

Cheers,
Chris

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: chris.a.mattmann@nasa.gov
WWW:  http://sunset.usc.edu/~mattmann/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Adjunct Assistant Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++



>
>
>
>
>On Mon, Jun 17, 2013 at 5:52 PM, Sanjaya Medonsa
><sa...@gmail.com>wrote:
>
>> Thanks Chris. I'll update the implementation to use file name instead of
>> OODT product id.
>>
>> Cheers,
>> Sanjaya
>>
>>
>> On Sun, Jun 16, 2013 at 12:51 AM, Mattmann, Chris A (398J) <
>> chris.a.mattmann@jpl.nasa.gov> wrote:
>>
>>> Hey Sanjaya, sure +1 use the Filename. It's not guaranteed to be
>>>unique,
>>> but you can easily just pop the first one off the top (latest) and take
>>> that (since it's sorted by product received time). You may check out
>>>the
>>> pcs-core module and some of its internal classes like FileManagerUtils
>>> to see some cool helper functions that could aid in this regard.
>>>
>>> Cheers,
>>> Chris
>>>
>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>> Chris Mattmann, Ph.D.
>>> Senior Computer Scientist
>>> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
>>> Office: 171-266B, Mailstop: 171-246
>>> Email: chris.a.mattmann@nasa.gov
>>> WWW:  http://sunset.usc.edu/~mattmann/
>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>> Adjunct Assistant Professor, Computer Science Department
>>> University of Southern California, Los Angeles, CA 90089 USA
>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>
>>>
>>>
>>>
>>>
>>>
>>> -----Original Message-----
>>> From: Sanjaya Medonsa <sa...@gmail.com>
>>> Reply-To: "dev@airavata.apache.org" <de...@airavata.apache.org>
>>> Date: Saturday, June 15, 2013 4:04 AM
>>> To: Airavata Dev <de...@airavata.apache.org>
>>> Subject: Re: Apache Airavata-OODT Integration
>>>
>>> >Thanks Chris for your help! Working directory is available in
>>> >JobExecutionContext in Airavata and directory can easily be retrieved.
>>> >Issue in my case is that, from XBaya GUI I take product id as input
>>>not
>>> >the
>>> >file name. Internally file stager query the file manager using
>>>product id
>>> >to retrieve product reference and corresponding file name to stage the
>>> >file
>>> >into input dir. Since this product id to file name mapping happens
>>> >internally during the file staging, my implementation don't have
>>>access
>>> to
>>> >filename unless I query the file manager to retrieve the corresponding
>>> >file
>>> >name using product id.
>>> >
>>> >One of the major issue in my implementation seems that I use OODT
>>>product
>>> >id as input, not the file name. Should I change my implementation to
>>>use
>>> >file name instead of product id ?
>>> >
>>> >Best Regards,
>>> >Sanjaya
>>> >
>>> >
>>> >On Fri, Jun 14, 2013 at 8:51 PM, Mattmann, Chris A (398J) <
>>> >chris.a.mattmann@jpl.nasa.gov> wrote:
>>> >
>>> >> Hey Sanjaya,
>>> >>
>>> >> Easy, see the attached PGEConfig.xml here:
>>> >>
>>> >> http://paste.apache.org/6OGW
>>> >>
>>> >> In that file:
>>> >>
>>> >> 1. We compute the staged file path by computing JobDir
>>> >> 2. We create in the exe block a staged input dir
>>> >> 3. We stage the files just using cps in the exeBlock (could have
>>> >> just as easily used fileStager)
>>> >> 4. We know that the file is [JobInputDir]/[Filename]
>>> >>
>>> >> HTH.
>>> >>
>>> >> Cheers,
>>> >> Chris
>>> >>
>>> >> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>> >> Chris Mattmann, Ph.D.
>>> >> Senior Computer Scientist
>>> >> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
>>> >> Office: 171-266B, Mailstop: 171-246
>>> >> Email: chris.a.mattmann@nasa.gov
>>> >> WWW:  http://sunset.usc.edu/~mattmann/
>>> >> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>> >> Adjunct Assistant Professor, Computer Science Department
>>> >> University of Southern California, Los Angeles, CA 90089 USA
>>> >> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>> >>
>>> >>
>>> >>
>>> >>
>>> >>
>>> >>
>>> >> -----Original Message-----
>>> >> From: Sanjaya Medonsa <sa...@gmail.com>
>>> >> Reply-To: "dev@airavata.apache.org" <de...@airavata.apache.org>
>>> >> Date: Friday, June 14, 2013 5:02 AM
>>> >> To: Airavata Dev <de...@airavata.apache.org>
>>> >> Subject: Re: Apache Airavata-OODT Integration
>>> >>
>>> >> >Thanks Chris for your input. I actually use the PGETaskInstance for
>>> >>file
>>> >> >staging with minimal additional code. But my issue issue not with
>>>the
>>> >>file
>>> >> >staging. As per my current implementation, application inputs
>>>product
>>> >>id.
>>> >> >Then using the capabilities in PGETaskInstance class, it does the
>>>file
>>> >> >staging. But my issue is that during the file staging product is
>>> >>mapped to
>>> >> >a file in specified working directory. I don't have a way to
>>>retrieve
>>> >>the
>>> >> >staged file name, as it is not recorded in Metadata (For this
>>>purpose,
>>> >>I
>>> >> >query the FileManager again to get the corresponding reference name
>>> >>for a
>>> >> >given product id). I need the staged file path, since I modify the
>>> >>input
>>> >> >product id into staged file path prior to actual workflow
>>>invocation.
>>> >> >Basically I am looking for some implementation where I can easily
>>> >> >retrieve,
>>> >> >staged file path for a given product id.
>>> >> >
>>> >> >Cheers,
>>> >> >Sanjaya
>>> >> >
>>> >> >
>>> >> >On Wed, Jun 12, 2013 at 10:04 PM, Mattmann, Chris A (398J) <
>>> >> >chris.a.mattmann@jpl.nasa.gov> wrote:
>>> >> >
>>> >> >> Hi Sanjaya,
>>> >> >>
>>> >> >> -----Original Message-----
>>> >> >>
>>> >> >> From: Sanjaya Medonsa <sa...@gmail.com>
>>> >> >> Reply-To: "dev@airavata.apache.org" <de...@airavata.apache.org>
>>> >> >> Date: Monday, June 10, 2013 5:20 PM
>>> >> >> To: "dev@airavata.apache.org" <de...@airavata.apache.org>
>>> >> >> Cc: "dev@oodt.apache.org" <de...@oodt.apache.org>
>>> >> >> Subject: Re: Apache Airavata-OODT Integration
>>> >> >>
>>> >> >> >Hi Chris,
>>> >> >> >       On configuration, I have get rid of all the configuration
>>> >>files,
>>> >> >> >including pge-config.xml. All the required configurations are
>>> >> >> >programmatically set.  Configurations such FileManagerServer URL
>>> are
>>> >> >> >configured in the airavata-server.properties file. I'll update
>>>the
>>> >> >>review
>>> >> >> >request with modified details.
>>> >> >>
>>> >> >> Great work!
>>> >> >>
>>> >> >>
>>> >> >> >       Still I am not quite clear on how to retrieve staged file
>>> >>path
>>> >> >> >properly. Currently I am using getStagedFilePath method
>>> >> >> >in ApacheAiravataWorkFlowInstanceImpl to regenerate the staged
>>>file
>>> >> >>path.
>>> >> >> >While I am going through the OODT code that I have seen method
>>>in
>>> >> >> >DataTransferer to notify FileManagerServer once transfer is
>>> >>completed.
>>> >> >>But
>>> >> >> >I couldn't see the same for product retrieval.
>>> >> >>
>>> >> >> Example:
>>> >> >>
>>> >> >>
>>> >>
>>> >>
>>> 
>>>http://svn.apache.org/repos/asf/oodt/trunk/pge/src/test/resources/pge-co
>>>n
>>> >> >>fi
>>> >> >> g.xml
>>> >> >>
>>> >> >>
>>> >> >> Review Board tickets:
>>> >> >> https://reviews.apache.org/r/4746/
>>> >> >>
>>> >> >> https://reviews.apache.org/r/5382/
>>> >> >>
>>> >> >>
>>> >> >> JIRA issue source (in OODT since 0.4):
>>> >> >>   https://issues.apache.org/jira/browse/OODT-443
>>> >> >>
>>> >> >>
>>> >> >> >       As you suggested I'll improve my workflow using Apache
>>>Tika.
>>> >>I'd
>>> >> >> >like to continue this as an Parallal task. While modifying
>>>staging
>>> >> >> >implementation based on community feedback, currently I am
>>>looking
>>> >>at
>>> >> >> >ingesting output back to OODT.
>>> >> >>
>>> >> >> See above for info on file staging. I would strongly encourage
>>>you
>>> >>not
>>> >> >> to reimplement CAS-PGE in Airavata -- it's pretty functional and
>>> >> >>expressive
>>> >> >> anyways and I would work to figure out how to make Airavata
>>>leverage
>>> >> >> CAS-PGE.
>>> >> >>
>>> >> >> Cheers,
>>> >> >> Chris
>>> >> >>
>>> >> >> 
>>>++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>> >> >> Chris Mattmann, Ph.D.
>>> >> >> Senior Computer Scientist
>>> >> >> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
>>> >> >> Office: 171-266B, Mailstop: 171-246
>>> >> >> Email: chris.a.mattmann@nasa.gov
>>> >> >> WWW:  http://sunset.usc.edu/~mattmann/
>>> >> >> 
>>>++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>> >> >> Adjunct Assistant Professor, Computer Science Department
>>> >> >> University of Southern California, Los Angeles, CA 90089 USA
>>> >> >> 
>>>++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>> >> >>
>>> >> >>
>>> >> >>
>>> >> >> >
>>> >> >> >
>>> >> >> >
>>> >> >> >On Wed, Jun 5, 2013 at 12:11 AM, Mattmann, Chris A (398J) <
>>> >> >> >chris.a.mattmann@jpl.nasa.gov> wrote:
>>> >> >> >
>>> >> >> >> Hi Sanjaya,
>>> >> >> >>
>>> >> >> >> I think starting out with /bin/ls would be good, maybe like a
>>> >>/bin/ls
>>> >> >> >> workflow, and then for each file returned, maybe run Apache
>>>Tika
>>> >>and
>>> >> >> >> extract its metadata and then pipe that to a file?
>>> >> >> >>
>>> >> >> >> How about that?
>>> >> >> >>
>>> >> >> >> Cheers,
>>> >> >> >> Chris
>>> >> >> >>
>>> >> >> >>
>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>> >> >> >> Chris Mattmann, Ph.D.
>>> >> >> >> Senior Computer Scientist
>>> >> >> >> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
>>> >> >> >> Office: 171-266B, Mailstop: 171-246
>>> >> >> >> Email: chris.a.mattmann@nasa.gov
>>> >> >> >> WWW:  http://sunset.usc.edu/~mattmann/
>>> >> >> >>
>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>> >> >> >> Adjunct Assistant Professor, Computer Science Department
>>> >> >> >> University of Southern California, Los Angeles, CA 90089 USA
>>> >> >> >>
>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>> >> >> >>
>>> >> >> >>
>>> >> >> >>
>>> >> >> >>
>>> >> >> >>
>>> >> >> >>
>>> >> >> >> -----Original Message-----
>>> >> >> >> From: Sanjaya Medonsa <sa...@gmail.com>
>>> >> >> >> Reply-To: "dev@airavata.apache.org" <de...@airavata.apache.org>
>>> >> >> >> Date: Tuesday, June 4, 2013 5:31 AM
>>> >> >> >> To: "dev@airavata.apache.org" <de...@airavata.apache.org>
>>> >> >> >> Cc: "dev@oodt.apache.org" <de...@oodt.apache.org>
>>> >> >> >> Subject: Re: Apache Airavata-OODT Integration
>>> >> >> >>
>>> >> >> >> >Hi Chris,
>>> >> >> >> >     Please see my comments below on the two items.
>>> >> >> >> >
>>> >> >> >> >Configuration : It should be possible to set them
>>> >>programmatically.
>>> >> >> >> >Actually I have implemented partly it for file staging
>>> >>information.
>>> >> >> >>I'll
>>> >> >> >> >work to get rid of the other configuration files.
>>> >> >> >> >
>>> >> >> >> >Staged File Path : I'll work on the suggested approach,
>>>though I
>>> >>am
>>> >> >>not
>>> >> >> >> >fully understand it at the moment. I guess I need to go
>>>through
>>> >>bit
>>> >> >> >>more
>>> >> >> >> >on
>>> >> >> >> >CAS-PGE and come back to you on the proposed approach.
>>> >> >> >> >
>>> >> >> >> >Currently I am testing this by wrapping /bin/ls command as
>>>GFac
>>> >> >> >>service. I
>>> >> >> >> >may need to test this with real workflow. Could you please
>>> >>provide
>>> >> >>me
>>> >> >> >>know
>>> >> >> >> >some guidance on better scenario to test this.
>>> >> >> >> >
>>> >> >> >> >Cheers,
>>> >> >> >> >Sanjaya
>>> >> >> >> >
>>> >> >> >> >
>>> >> >> >> >
>>> >> >> >> >
>>> >> >> >> >On Mon, Jun 3, 2013 at 8:17 PM, Mattmann, Chris A (398J) <
>>> >> >> >> >chris.a.mattmann@jpl.nasa.gov> wrote:
>>> >> >> >> >
>>> >> >> >> >> Hi Sanjaya,
>>> >> >> >> >>
>>> >> >> >> >> -----Original Message-----
>>> >> >> >> >>
>>> >> >> >> >> From: Sanjaya Medonsa <sa...@gmail.com>
>>> >> >> >> >> Reply-To: "dev@airavata.apache.org"
>>><de...@airavata.apache.org>
>>> >> >> >> >> Date: Thursday, May 30, 2013 5:12 AM
>>> >> >> >> >> To: "dev@oodt.apache.org" <de...@oodt.apache.org>,
>>> >> >> >> >>"dev@airavata.apache.org"
>>> >> >> >> >> <de...@airavata.apache.org>
>>> >> >> >> >> Subject: Apache Airavata-OODT Integration
>>> >> >> >> >>
>>> >> >> >> >> >Hi,
>>> >> >> >> >> >     I have worked on the Apache Airavata integration with
>>> >>Apache
>>> >> >> >> >>OODT. As
>>> >> >> >> >> >a first step, I have implemented integration with Apache
>>>OODT
>>> >> >>file
>>> >> >> >> >> >manager component.
>>> >> >> >> >>
>>> >> >> >> >> Great work!!
>>> >> >> >> >>
>>> >> >> >> >> Comments below:
>>> >> >> >> >>
>>> >> >> >> >> >      1. Introduce a new GFac Schema type called
>>>OODTProduct
>>> >> >>which
>>> >> >> >> >>takes
>>> >> >> >> >> >APache OODT product IDs as input.
>>> >> >> >> >> >      2. Implemented new pre GFac Handler by extending
>>>Apache
>>> >> >>OODT
>>> >> >> >> >> >PgeTaskInstance to stage the corresponding file into the
>>> >>working
>>> >> >> >> >> >directory.
>>> >> >> >> >> >      3. Once file is staged, input parameter with OODT
>>> >>product
>>> >> >>id
>>> >> >> >>is
>>> >> >> >> >> >replaced with path of the staged file for downstream
>>> >>processing
>>> >> >> >> >> >
>>> >> >> >> >> >I have tested the implementation with Gfac application
>>>which
>>> >> >>wraps
>>> >> >> >> >>/bin/ls
>>> >> >> >> >> >command. Application takes product id as input and stage
>>> >> >> >>corresponding
>>> >> >> >> >> >file
>>> >> >> >> >> >into the working directory and /bin/ls is executed against
>>> the
>>> >> >> >>staged
>>> >> >> >> >> >file.
>>> >> >> >> >> >Hope this is a valid testing scenario.
>>> >> >> >> >> >
>>> >> >> >> >> >Concerns
>>> >> >> >> >> >- Configurations : I have added new configuration file
>>>named
>>> >>and
>>> >> >> >> >> >oodt-integration.properties in addition to
>>> >>dynamic_metadata.met
>>> >> >>and
>>> >> >> >> >> >pge-config.xml files used by OODT. But at the moment
>>>there is
>>> >>no
>>> >> >> >>item
>>> >> >> >> >> >configured with the oodt-integration.properties.
>>> >> >> >> >>
>>> >> >> >> >> You probably only need the pge-config.xml file. Dynamic
>>> >>metadata,
>>> >> >>and
>>> >> >> >> >>the
>>> >> >> >> >> task configuration properties can be specified
>>> >>programmatically,
>>> >> >> >>right?
>>> >> >> >> >>
>>> >> >> >> >> >- Staged File Name - With the current implementation of
>>> >> >> >> >>PgeTaskInstance it
>>> >> >> >> >> >is not possible to retrieve path of the staged file. Due
>>>to
>>> >>this
>>> >> >> >> >> >limitation, I have query the FileManagerServer with
>>>product
>>> id
>>> >> >>and
>>> >> >> >> >> >retrieve
>>> >> >> >> >> >the file name and computed the file path using
>>>information of
>>> >> >> >>working
>>> >> >> >> >> >directory.
>>> >> >> >> >>
>>> >> >> >> >> I'm not sure I understand this? If you store and record the
>>> >> >>Filename,
>>> >> >> >> >>and
>>> >> >> >> >> FileLocation
>>> >> >> >> >> metadata files, then you can easily retrieve the staged
>>>file
>>> >>path
>>> >> >> >>via a
>>> >> >> >> >> SQLquery
>>> >> >> >> >> via CAS-PGE by simply setting the
>>> >> >>FORMAT=('$FileLocation/$Filename')
>>> >> >> >>in
>>> >> >> >> >> the response.
>>> >> >> >> >> Can you comment on this?
>>> >> >> >> >>
>>> >> >> >> >> >- Currently it is not possible to execute the workflow
>>>using
>>> >> >>Xbaya
>>> >> >> >>due
>>> >> >> >> >>to
>>> >> >> >> >> >validation failure due to new schema type. I have
>>>commented
>>> >>out
>>> >> >>the
>>> >> >> >> >> >relevant validation code for testing purpose.
>>> >> >> >> >>
>>> >> >> >> >> OK, will probably need to work on this.
>>> >> >> >> >>
>>> >> >> >> >> >
>>> >> >> >> >> >Currently I am having an issue with review board client
>>>tool
>>> >>and
>>> >> >> >>need
>>> >> >> >> >>to
>>> >> >> >> >> >resolve it to upload the code for review.
>>> >> >> >> >>
>>> >> >> >> >> I see later that you got this working, so will head over
>>>and
>>> >> >>review
>>> >> >> >>that
>>> >> >> >> >> now.
>>> >> >> >> >>
>>> >> >> >> >> Thanks!
>>> >> >> >> >>
>>> >> >> >> >> Cheers,
>>> >> >> >> >> Chris
>>> >> >> >> >>
>>> >> >> >> >>
>>> >>++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>> >> >> >> >> Chris Mattmann, Ph.D.
>>> >> >> >> >> Senior Computer Scientist
>>> >> >> >> >> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
>>> >> >> >> >> Office: 171-266B, Mailstop: 171-246
>>> >> >> >> >> Email: chris.a.mattmann@nasa.gov
>>> >> >> >> >> WWW:  http://sunset.usc.edu/~mattmann/
>>> >> >> >> >>
>>> >>++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>> >> >> >> >> Adjunct Assistant Professor, Computer Science Department
>>> >> >> >> >> University of Southern California, Los Angeles, CA 90089
>>>USA
>>> >> >> >> >>
>>> >>++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>> >> >> >> >>
>>> >> >> >> >>
>>> >> >> >> >>
>>> >> >> >> >>
>>> >> >> >>
>>> >> >> >>
>>> >> >>
>>> >> >>
>>> >>
>>> >>
>>>
>>>
>>