You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@oodt.apache.org by "Chris A. Mattmann (JIRA)" <ji...@apache.org> on 2014/08/10 09:38:11 UTC

[jira] [Resolved] (OODT-667) CAS-PGE no longer respects writers and file tags from earlier pgeConfig.xml files

     [ https://issues.apache.org/jira/browse/OODT-667?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Chris A. Mattmann resolved OODT-667.
------------------------------------

       Resolution: Fixed
    Fix Version/s:     (was: 0.8)
                   0.7

- finally got this working, 3 versions later! Fix committed in r1617057, fully restoring backwards compat, and also supporting forward compat. Committed fixes to wmgr script in workflow and in RADIX too, so CAS-PGE 0.7 is fully fixed in trunk and going forward!

> CAS-PGE no longer respects writers and file tags from earlier pgeConfig.xml files
> ---------------------------------------------------------------------------------
>
>                 Key: OODT-667
>                 URL: https://issues.apache.org/jira/browse/OODT-667
>             Project: OODT
>          Issue Type: Bug
>          Components: pge wrapper framework
>    Affects Versions: 0.4, 0.5, 0.6
>            Reporter: Chris A. Mattmann
>            Assignee: Chris A. Mattmann
>              Labels: back, compat, config, files, fix, pge, regex
>             Fix For: 0.7
>
>         Attachments: OODT-667.Mattmann.060914.patch.txt
>
>
> It's been a long standing bug post Apache OODT 0.3 (0.4 and beyond) that the updates to CAS-PGE to simplify its crawling system for met extraction based on files and regExp tags and to unify it with the AutoDetectProductCrawler has caused cas-pge to no longer honor the following blocks from pgeConfig.xml files:
> {code:xml}
> <output>
>   <dir>
>     <files regExp="someRegExp" metWriter="some.class" args="some args"/>
>   <!--...-->
>    </dir>
> </output>
> {code}
> This was a conscious decision and discuss by Brian Foster and myself and others on several occasions:
> https://issues.apache.org/jira/browse/OODT-426
> http://markmail.org/message/oe5tmutu374wqldb
> I support Brian's implementation but I think we took a step back in not offering backwards compatibility that simply:
> 1. still reads the pgeConfig.xml files tags above and then;
> 2. constructs the appropriate AutoDetectCrawler and RenamingConventions and other plumbing behind the scenes.
> Note one of the key features that becomes important in these situations is to have CAS-PGE job directories contain the metadata files serialized for offline inspection in case there are errors. Currently we lost support for that (as evidenced by the removal of the met key MET_FILE_EXT). I am also going to add that back in, and simply subclass AutoDetectProductCrawler in cas-pge, and then override its crawling step to also serialize the met files it generates. 
> That will get us back to full forwards and backwards compat support starting in 0.7 for *all* versions of CAS-PGE pgeConfig.xml files. wish me luck!



--
This message was sent by Atlassian JIRA
(v6.2#6252)