You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@oodt.apache.org by "Nguyen, Ricky" <rn...@chla.usc.edu> on 2012/02/10 21:41:15 UTC

adding metadata to existing products

How would I add metadata to existing (already ingested to FileMgr) products, without re-ingesting (producing a new product ID and lucene document because the product hasn't changed)?

Are these possible solutions: Can crawler run multiple metExtractors on each file to be ingested? Or perhaps there is a way to get PGE tasks to update an existing product's metadata?

Thanks,
Ricky


---------------------------------------------------------------------
CONFIDENTIALITY NOTICE: This e-mail message, including any attachments, 
is for the sole use of the intended recipient(s) and may contain confidential
or legally privileged information. Any unauthorized review, use, disclosure
or distribution is prohibited. If you are not the intended recipient, please
contact the sender by reply e-mail and destroy all copies of this original message.  

---------------------------------------------------------------------


Re: adding metadata to existing products

Posted by "Mattmann, Chris A (388J)" <ch...@jpl.nasa.gov>.
Hey Ricky,

On Feb 10, 2012, at 12:41 PM, Nguyen, Ricky wrote:

> How would I add metadata to existing (already ingested to FileMgr) products, without re-ingesting (producing a new product ID and lucene document because the product hasn't changed)?

Great question! In general, here are a couple ways to do this:

1. via XML-RPC 
  - there is a method called updateMetadata introduced in 0.4-SNAPSHOT (trunk) since OODT-256
that provides an "updateMetadata" capability: http://s.apache.org/0Ww

2. use CAS curator and its REST API here:

http://oodt.apache.org/components/maven/curator/api/index.html

One of the underlying methods (not sure if it's documented at the above)
is a method to update the metadata for an existing product. Caveat: this
only works with the LuceneCatalog and includes a forked version of it
that Paul Ramirez made that includes an updateMetadata capability. It
would be great to bring this back into the trunk and re-align them; just
haven't had the cycles yet.

> 
> Are these possible solutions: Can crawler run multiple metExtractors on each file to be ingested? Or perhaps there is a way to get PGE tasks to update an existing product's metadata?

Yep, the crawler can run multiple met extractors, you have to use the AutoDetectCrawler to do this, or develop
a met extractor that can run a series of other extractors that you want. Regarding the PGE tasks, you can 
certainly leverage it for its control flow to do metadata updates (which happen *after* execution but *before*
crawling) via PGE extractors, as well.

HTH!

Cheers,
Chris

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: chris.a.mattmann@nasa.gov
WWW:   http://sunset.usc.edu/~mattmann/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Adjunct Assistant Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++


Re: adding metadata to existing products

Posted by "Mattmann, Chris A (388J)" <ch...@jpl.nasa.gov>.
Hey Ricky,

On Feb 10, 2012, at 12:41 PM, Nguyen, Ricky wrote:

> How would I add metadata to existing (already ingested to FileMgr) products, without re-ingesting (producing a new product ID and lucene document because the product hasn't changed)?

Great question! In general, here are a couple ways to do this:

1. via XML-RPC 
  - there is a method called updateMetadata introduced in 0.4-SNAPSHOT (trunk) since OODT-256
that provides an "updateMetadata" capability: http://s.apache.org/0Ww

2. use CAS curator and its REST API here:

http://oodt.apache.org/components/maven/curator/api/index.html

One of the underlying methods (not sure if it's documented at the above)
is a method to update the metadata for an existing product. Caveat: this
only works with the LuceneCatalog and includes a forked version of it
that Paul Ramirez made that includes an updateMetadata capability. It
would be great to bring this back into the trunk and re-align them; just
haven't had the cycles yet.

> 
> Are these possible solutions: Can crawler run multiple metExtractors on each file to be ingested? Or perhaps there is a way to get PGE tasks to update an existing product's metadata?

Yep, the crawler can run multiple met extractors, you have to use the AutoDetectCrawler to do this, or develop
a met extractor that can run a series of other extractors that you want. Regarding the PGE tasks, you can 
certainly leverage it for its control flow to do metadata updates (which happen *after* execution but *before*
crawling) via PGE extractors, as well.

HTH!

Cheers,
Chris

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: chris.a.mattmann@nasa.gov
WWW:   http://sunset.usc.edu/~mattmann/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Adjunct Assistant Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++