You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@oodt.apache.org by "Chris A. Mattmann (JIRA)" <ji...@apache.org> on 2014/02/08 00:41:21 UTC
[jira] [Created] (OODT-667) CAS-PGE no longer respects writers and
file tags from earlier pgeConfig.xml files
Chris A. Mattmann created OODT-667:
--------------------------------------
Summary: CAS-PGE no longer respects writers and file tags from earlier pgeConfig.xml files
Key: OODT-667
URL: https://issues.apache.org/jira/browse/OODT-667
Project: OODT
Issue Type: Bug
Components: pge wrapper framework
Affects Versions: 0.6, 0.5, 0.4
Reporter: Chris A. Mattmann
Assignee: Chris A. Mattmann
Fix For: 0.7
It's been a long standing bug post Apache OODT 0.3 (0.4 and beyond) that the updates to CAS-PGE to simplify its crawling system for met extraction based on files and regExp tags and to unify it with the AutoDetectProductCrawler has caused cas-pge to no longer honor the following blocks from pgeConfig.xml files:
{code:xml}
<output>
<dir>
<files regExp="someRegExp" metWriter="some.class" args="some args"/>
<!--...-->
</dir>
</output>
{code}
This was a conscious decision and discuss by Brian Foster and myself and others on several occasions:
https://issues.apache.org/jira/browse/OODT-426
http://markmail.org/message/oe5tmutu374wqldb
I support Brian's implementation but I think we took a step back in not offering backwards compatibility that simply:
1. still reads the pgeConfig.xml files tags above and then;
2. constructs the appropriate AutoDetectCrawler and RenamingConventions and other plumbing behind the scenes.
Note one of the key features that becomes important in these situations is to have CAS-PGE job directories contain the metadata files serialized for offline inspection in case there are errors. Currently we lost support for that (as evidenced by the removal of the met key MET_FILE_EXT). I am also going to add that back in, and simply subclass AutoDetectProductCrawler in cas-pge, and then override its crawling step to also serialize the met files it generates.
That will get us back to full forwards and backwards compat support starting in 0.7 for *all* versions of CAS-PGE pgeConfig.xml files. wish me luck!
--
This message was sent by Atlassian JIRA
(v6.1.5#6160)