You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@oodt.apache.org by "Brian Foster (Created) (JIRA)" <ji...@apache.org> on 2012/03/20 09:03:50 UTC
[jira] [Created] (OODT-426) Introduce a CAS-Metadata based renaming
interface
Introduce a CAS-Metadata based renaming interface
-------------------------------------------------
Key: OODT-426
URL: https://issues.apache.org/jira/browse/OODT-426
Project: OODT
Issue Type: Sub-task
Components: crawler, metadata container, pge wrapper framework
Affects Versions: 0.3
Environment: none
Reporter: Brian Foster
Assignee: Brian Foster
Priority: Minor
Fix For: 0.4
The idea here is that CAS-Metadata will introduce a new NamingConvention interface, which will allow for renaming of files. CAS-Crawler will then be modified to support specified NamingConventions which will be run after all preconditions have passed for a given file. This will then allow CAS-PGE to then use AutoDetectProductCrawler instead of StdProductCrawler, which will standardize across the board for file extraction (currently CAS-PGE has it's own file extraction interface which uses regular expression to determine files which should be extracted and ingested). The only missing feature in CAS-Crawler which CAS-PGE supports is file renaming, which this new NamingConvention interface will introduce. Here is what the NamingConvention interface will look like:
{code}
public interface NamingConvention {
public File rename(File file, Metadata metadata)
throws NamingConventionException;
}
{code}
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (OODT-426) Introduce a CAS-Metadata based
renaming interface
Posted by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/OODT-426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13247973#comment-13247973 ]
jiraposter@reviews.apache.org commented on OODT-426:
----------------------------------------------------
bq. On 2012-04-04 18:34:56, Chris Mattmann wrote:
bq. > trunk/pge/src/main/java/org/apache/oodt/cas/pge/PGETaskInstance.java, line 151
bq. > <https://reviews.apache.org/r/4628/diff/1/?file=98791#file98791line151>
bq. >
bq. > this seems like an ancillary change to this patch. However, it's a useful functionality so I don't feel strongly about separating it out. Just be wary of stuff like this (b/c as it grows) it can take away from the purpose of the patch ;)
ya... thought that too when i was making the change... but i was writing the unit-test for a method that was using it so i just fixed it right now so i don't have to rewrite the unit-test later
bq. On 2012-04-04 18:34:56, Chris Mattmann wrote:
bq. > trunk/pge/src/main/java/org/apache/oodt/cas/pge/config/OutputDir.java, line 49
bq. > <https://reviews.apache.org/r/4628/diff/1/?file=98792#file98792line49>
bq. >
bq. > +like
ack
- brian
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/4628/#review6686
-----------------------------------------------------------
On 2012-04-06 02:16:10, brian Foster wrote:
bq.
bq. -----------------------------------------------------------
bq. This is an automatically generated e-mail. To reply, visit:
bq. https://reviews.apache.org/r/4628/
bq. -----------------------------------------------------------
bq.
bq. (Updated 2012-04-06 02:16:10)
bq.
bq.
bq. Review request for oodt, Chris Mattmann, Ricky Nguyen, Paul Ramirez, and Thomas Bennett.
bq.
bq.
bq. Summary
bq. -------
bq.
bq. CAS-PGE Changes to this issue...
bq. - Renaming and Metadata extraction removed from CAS-PGE and instead CAS-PGE now uses AutoDetectProductCrawler instead of StdProductCrawler
bq.
bq.
bq. This addresses bug OODT-426.
bq. https://issues.apache.org/jira/browse/OODT-426
bq.
bq.
bq. Diffs
bq. -----
bq.
bq. trunk/pge/src/main/resources/examples/Crawler/naming-beans.xml PRE-CREATION
bq. trunk/pge/src/main/resources/examples/Crawler/precondition-beans.xml PRE-CREATION
bq. trunk/pge/src/main/resources/examples/MetadataOutputFiles/metadata-output.xml 1302648
bq. trunk/pge/src/main/resources/examples/PgeConfigFiles/pge-config.xml 1302648
bq. trunk/pge/src/test/org/apache/oodt/cas/pge/TestPGETaskInstance.java 1302781
bq. trunk/pge/src/main/resources/examples/Crawler/mime-types.xml PRE-CREATION
bq. trunk/pge/src/main/resources/examples/Crawler/mime-extractor-map.xml PRE-CREATION
bq. trunk/pge/src/main/resources/examples/Crawler/crawler-config.xml PRE-CREATION
bq. trunk/pge/src/main/resources/examples/Crawler/filename.extractor.config.xml PRE-CREATION
bq. trunk/pge/src/main/resources/examples/Crawler/action-beans.xml PRE-CREATION
bq. trunk/pge/src/main/resources/examples/Crawler/action-beans.properties PRE-CREATION
bq. trunk/pge/src/main/java/org/apache/oodt/cas/pge/writers/metlist/MetadataListPcsMetFileWriter.java 1302648
bq. trunk/pge/src/main/java/org/apache/oodt/cas/pge/writers/xslt/XslTransformWriter.java 1302648
bq. trunk/pge/src/main/java/org/apache/oodt/cas/pge/writers/SciPgeConfigFileWriter.java 1302648
bq. trunk/pge/src/main/java/org/apache/oodt/cas/pge/config/XmlFilePgeConfigBuilder.java 1302648
bq. trunk/pge/src/main/java/org/apache/oodt/cas/pge/metadata/PgeTaskMetKeys.java 1302648
bq. trunk/pge/src/main/java/org/apache/oodt/cas/pge/writers/ExternExtractorMetWriter.java 1302648
bq. trunk/pge/src/main/java/org/apache/oodt/cas/pge/writers/FilenameExtractorWriter.java 1302648
bq. trunk/pge/src/main/java/org/apache/oodt/cas/pge/writers/PcsMetFileWriter.java 1302648
bq. trunk/pge/src/main/java/org/apache/oodt/cas/pge/config/RenamingConv.java 1302648
bq. trunk/pge/src/main/java/org/apache/oodt/cas/pge/config/PgeConfigBuilder.java 1302648
bq. trunk/pge/src/main/java/org/apache/oodt/cas/pge/config/PgeConfigMetKeys.java 1302648
bq. trunk/pge/src/main/java/org/apache/oodt/cas/pge/config/RegExprOutputFiles.java 1302648
bq. trunk/pge/src/main/java/org/apache/oodt/cas/pge/config/OutputDir.java 1302648
bq. trunk/pge/src/main/java/org/apache/oodt/cas/pge/config/PgeConfig.java 1302648
bq. trunk/pge/pom.xml 1302648
bq. trunk/pge/src/main/java/org/apache/oodt/cas/pge/PGETaskInstance.java 1302648
bq.
bq. Diff: https://reviews.apache.org/r/4628/diff
bq.
bq.
bq. Testing
bq. -------
bq.
bq. Several Unit-tests
bq.
bq.
bq. Thanks,
bq.
bq. brian
bq.
bq.
> Introduce a CAS-Metadata based renaming interface
> -------------------------------------------------
>
> Key: OODT-426
> URL: https://issues.apache.org/jira/browse/OODT-426
> Project: OODT
> Issue Type: Sub-task
> Components: crawler, metadata container, pge wrapper framework
> Affects Versions: 0.3
> Environment: none
> Reporter: Brian Foster
> Assignee: Brian Foster
> Priority: Minor
> Fix For: 0.5
>
> Attachments: OODT-426.2012-03-20.cas-crawler.patch.txt, OODT-426.2012-03-20.cas-metadata.patch.txt, OODT-426.2012-03-24.cas-crawler.patch.txt, OODT-426.2012-04-03.cas-pge.txt
>
>
> The idea here is that CAS-Metadata will introduce a new NamingConvention interface, which will allow for renaming of files. CAS-Crawler will then be modified to support specified NamingConventions which will be run after all preconditions have passed for a given file. This will then allow CAS-PGE to then use AutoDetectProductCrawler instead of StdProductCrawler, which will standardize across the board for file extraction (currently CAS-PGE has it's own file extraction interface which uses regular expression to determine files which should be extracted and ingested). The only missing feature in CAS-Crawler which CAS-PGE supports is file renaming, which this new NamingConvention interface will introduce. Here is what the NamingConvention interface will look like:
> {code}
> public interface NamingConvention {
> public File rename(File file, Metadata metadata)
> throws NamingConventionException;
> }
> {code}
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (OODT-426) Introduce a CAS-Metadata based
renaming interface
Posted by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/OODT-426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13245796#comment-13245796 ]
jiraposter@reviews.apache.org commented on OODT-426:
----------------------------------------------------
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/4628/
-----------------------------------------------------------
Review request for oodt, Chris Mattmann, Ricky Nguyen, Paul Ramirez, and Thomas Bennett.
Summary
-------
CAS-PGE Changes to this issue...
- Renaming and Metadata extraction removed from CAS-PGE and instead CAS-PGE now uses AutoDetectProductCrawler instead of StdProductCrawler
This addresses bug OODT-426.
https://issues.apache.org/jira/browse/OODT-426
Diffs
-----
trunk/pge/pom.xml 1302648
trunk/pge/src/main/java/org/apache/oodt/cas/pge/PGETaskInstance.java 1302648
trunk/pge/src/main/java/org/apache/oodt/cas/pge/config/OutputDir.java 1302648
trunk/pge/src/main/java/org/apache/oodt/cas/pge/config/PgeConfig.java 1302648
trunk/pge/src/main/java/org/apache/oodt/cas/pge/config/PgeConfigBuilder.java 1302648
trunk/pge/src/main/java/org/apache/oodt/cas/pge/config/PgeConfigMetKeys.java 1302648
trunk/pge/src/main/java/org/apache/oodt/cas/pge/config/RegExprOutputFiles.java 1302648
trunk/pge/src/main/java/org/apache/oodt/cas/pge/config/RenamingConv.java 1302648
trunk/pge/src/main/java/org/apache/oodt/cas/pge/config/XmlFilePgeConfigBuilder.java 1302648
trunk/pge/src/main/java/org/apache/oodt/cas/pge/metadata/PgeTaskMetKeys.java 1302648
trunk/pge/src/main/java/org/apache/oodt/cas/pge/writers/ExternExtractorMetWriter.java 1302648
trunk/pge/src/main/java/org/apache/oodt/cas/pge/writers/FilenameExtractorWriter.java 1302648
trunk/pge/src/main/java/org/apache/oodt/cas/pge/writers/PcsMetFileWriter.java 1302648
trunk/pge/src/main/java/org/apache/oodt/cas/pge/writers/SciPgeConfigFileWriter.java 1302648
trunk/pge/src/main/java/org/apache/oodt/cas/pge/writers/metlist/MetadataListPcsMetFileWriter.java 1302648
trunk/pge/src/main/java/org/apache/oodt/cas/pge/writers/xslt/XslTransformWriter.java 1302648
trunk/pge/src/main/resources/examples/Crawler/action-beans.xml PRE-CREATION
trunk/pge/src/main/resources/examples/Crawler/crawler-config.xml PRE-CREATION
trunk/pge/src/main/resources/examples/Crawler/mime-extractor-map.xml PRE-CREATION
trunk/pge/src/main/resources/examples/Crawler/mime-types.xml PRE-CREATION
trunk/pge/src/main/resources/examples/Crawler/naming-beans.xml PRE-CREATION
trunk/pge/src/main/resources/examples/Crawler/precondition-beans.xml PRE-CREATION
trunk/pge/src/main/resources/examples/MetadataOutputFiles/metadata-output.xml 1302648
trunk/pge/src/main/resources/examples/PgeConfigFiles/pge-config.xml 1302648
trunk/pge/src/test/org/apache/oodt/cas/pge/TestPGETaskInstance.java 1302781
Diff: https://reviews.apache.org/r/4628/diff
Testing
-------
Several Unit-tests
Thanks,
brian
> Introduce a CAS-Metadata based renaming interface
> -------------------------------------------------
>
> Key: OODT-426
> URL: https://issues.apache.org/jira/browse/OODT-426
> Project: OODT
> Issue Type: Sub-task
> Components: crawler, metadata container, pge wrapper framework
> Affects Versions: 0.3
> Environment: none
> Reporter: Brian Foster
> Assignee: Brian Foster
> Priority: Minor
> Fix For: 0.5
>
> Attachments: OODT-426.2012-03-20.cas-crawler.patch.txt, OODT-426.2012-03-20.cas-metadata.patch.txt, OODT-426.2012-03-24.cas-crawler.patch.txt, OODT-426.2012-04-03.cas-pge.txt
>
>
> The idea here is that CAS-Metadata will introduce a new NamingConvention interface, which will allow for renaming of files. CAS-Crawler will then be modified to support specified NamingConventions which will be run after all preconditions have passed for a given file. This will then allow CAS-PGE to then use AutoDetectProductCrawler instead of StdProductCrawler, which will standardize across the board for file extraction (currently CAS-PGE has it's own file extraction interface which uses regular expression to determine files which should be extracted and ingested). The only missing feature in CAS-Crawler which CAS-PGE supports is file renaming, which this new NamingConvention interface will introduce. Here is what the NamingConvention interface will look like:
> {code}
> public interface NamingConvention {
> public File rename(File file, Metadata metadata)
> throws NamingConventionException;
> }
> {code}
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (OODT-426) Introduce a CAS-Metadata based renaming
interface
Posted by "Chris A. Mattmann (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/OODT-426?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Chris A. Mattmann updated OODT-426:
-----------------------------------
Fix Version/s: (was: 0.5)
0.4
> Introduce a CAS-Metadata based renaming interface
> -------------------------------------------------
>
> Key: OODT-426
> URL: https://issues.apache.org/jira/browse/OODT-426
> Project: OODT
> Issue Type: Sub-task
> Components: crawler, metadata container, pge wrapper framework
> Affects Versions: 0.3
> Environment: none
> Reporter: Brian Foster
> Assignee: Brian Foster
> Priority: Minor
> Fix For: 0.4
>
> Attachments: OODT-426.2012-03-20.cas-crawler.patch.txt, OODT-426.2012-03-20.cas-metadata.patch.txt, OODT-426.2012-03-24.cas-crawler.patch.txt, OODT-426.2012-04-03.cas-pge.txt
>
>
> The idea here is that CAS-Metadata will introduce a new NamingConvention interface, which will allow for renaming of files. CAS-Crawler will then be modified to support specified NamingConventions which will be run after all preconditions have passed for a given file. This will then allow CAS-PGE to then use AutoDetectProductCrawler instead of StdProductCrawler, which will standardize across the board for file extraction (currently CAS-PGE has it's own file extraction interface which uses regular expression to determine files which should be extracted and ingested). The only missing feature in CAS-Crawler which CAS-PGE supports is file renaming, which this new NamingConvention interface will introduce. Here is what the NamingConvention interface will look like:
> {code}
> public interface NamingConvention {
> public File rename(File file, Metadata metadata)
> throws NamingConventionException;
> }
> {code}
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (OODT-426) Introduce a CAS-Metadata based
renaming interface
Posted by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/OODT-426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13239091#comment-13239091 ]
jiraposter@reviews.apache.org commented on OODT-426:
----------------------------------------------------
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/4444/#review6391
-----------------------------------------------------------
Ship it!
LGTM!
- Chris
On 2012-03-27 00:47:30, brian Foster wrote:
bq.
bq. -----------------------------------------------------------
bq. This is an automatically generated e-mail. To reply, visit:
bq. https://reviews.apache.org/r/4444/
bq. -----------------------------------------------------------
bq.
bq. (Updated 2012-03-27 00:47:30)
bq.
bq.
bq. Review request for oodt, Chris Mattmann, Ricky Nguyen, Paul Ramirez, and Thomas Bennett.
bq.
bq.
bq. Summary
bq. -------
bq.
bq. - Introduced NamingConvention support, which for MetExtractorProductCrawler now takes an ID for the NamingConvention to use and AutoDetectProductCrawler has a new element <namingConvention class=""/> in MimeExtractorRepo
bq. - Also cleaned up handleFile(File)... documented better and is now public and returns the IngestResult for what happened when called.
bq.
bq.
bq. This addresses bug OODT-426.
bq. https://issues.apache.org/jira/browse/OODT-426
bq.
bq.
bq. Diffs
bq. -----
bq.
bq. trunk/crawler/src/main/java/org/apache/oodt/cas/crawl/ProductCrawler.java 1305657
bq. trunk/crawler/src/main/java/org/apache/oodt/cas/crawl/MetExtractorProductCrawler.java 1305657
bq. trunk/crawler/pom.xml 1305657
bq. trunk/crawler/src/main/java/org/apache/oodt/cas/crawl/AutoDetectProductCrawler.java 1305657
bq. trunk/crawler/src/main/java/org/apache/oodt/cas/crawl/StdProductCrawler.java 1305657
bq. trunk/crawler/src/main/java/org/apache/oodt/cas/crawl/typedetection/MetExtractorSpec.java 1305657
bq. trunk/crawler/src/main/java/org/apache/oodt/cas/crawl/typedetection/MimeExtractorConfigMetKeys.java 1305657
bq. trunk/crawler/src/main/java/org/apache/oodt/cas/crawl/typedetection/MimeExtractorConfigReader.java 1305657
bq. trunk/crawler/src/main/java/org/apache/oodt/cas/crawl/typedetection/MimeExtractorRepo.java 1305657
bq. trunk/crawler/src/main/resources/cmd-line-options.xml 1305657
bq. trunk/crawler/src/main/resources/crawler-config.xml 1305657
bq. trunk/crawler/src/main/resources/examples/mime-extractor-map.xml 1305657
bq. trunk/crawler/src/main/resources/naming-beans.xml PRE-CREATION
bq. trunk/crawler/src/test/org/apache/oodt/cas/crawl/StateAwareProductCrawler.java PRE-CREATION
bq. trunk/crawler/src/test/org/apache/oodt/cas/crawl/TestProductCrawler.java PRE-CREATION
bq. trunk/crawler/src/test/org/apache/oodt/cas/crawl/typedetection/TestMimeExtractorConfigReader.java PRE-CREATION
bq.
bq. Diff: https://reviews.apache.org/r/4444/diff
bq.
bq.
bq. Testing
bq. -------
bq.
bq. Wrote several unit-tests
bq.
bq.
bq. Thanks,
bq.
bq. brian
bq.
bq.
> Introduce a CAS-Metadata based renaming interface
> -------------------------------------------------
>
> Key: OODT-426
> URL: https://issues.apache.org/jira/browse/OODT-426
> Project: OODT
> Issue Type: Sub-task
> Components: crawler, metadata container, pge wrapper framework
> Affects Versions: 0.3
> Environment: none
> Reporter: Brian Foster
> Assignee: Brian Foster
> Priority: Minor
> Fix For: 0.4
>
> Attachments: OODT-426.2012-03-20.cas-crawler.patch.txt, OODT-426.2012-03-20.cas-metadata.patch.txt, OODT-426.2012-03-24.cas-crawler.patch.txt
>
>
> The idea here is that CAS-Metadata will introduce a new NamingConvention interface, which will allow for renaming of files. CAS-Crawler will then be modified to support specified NamingConventions which will be run after all preconditions have passed for a given file. This will then allow CAS-PGE to then use AutoDetectProductCrawler instead of StdProductCrawler, which will standardize across the board for file extraction (currently CAS-PGE has it's own file extraction interface which uses regular expression to determine files which should be extracted and ingested). The only missing feature in CAS-Crawler which CAS-PGE supports is file renaming, which this new NamingConvention interface will introduce. Here is what the NamingConvention interface will look like:
> {code}
> public interface NamingConvention {
> public File rename(File file, Metadata metadata)
> throws NamingConventionException;
> }
> {code}
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (OODT-426) Introduce a CAS-Metadata based
renaming interface
Posted by "Brian Foster (Resolved) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/OODT-426?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Brian Foster resolved OODT-426.
-------------------------------
Resolution: Fixed
- cas-pge fix in r1311492
> Introduce a CAS-Metadata based renaming interface
> -------------------------------------------------
>
> Key: OODT-426
> URL: https://issues.apache.org/jira/browse/OODT-426
> Project: OODT
> Issue Type: Sub-task
> Components: crawler, metadata container, pge wrapper framework
> Affects Versions: 0.3
> Environment: none
> Reporter: Brian Foster
> Assignee: Brian Foster
> Priority: Minor
> Fix For: 0.5
>
> Attachments: OODT-426.2012-03-20.cas-crawler.patch.txt, OODT-426.2012-03-20.cas-metadata.patch.txt, OODT-426.2012-03-24.cas-crawler.patch.txt, OODT-426.2012-04-03.cas-pge.txt
>
>
> The idea here is that CAS-Metadata will introduce a new NamingConvention interface, which will allow for renaming of files. CAS-Crawler will then be modified to support specified NamingConventions which will be run after all preconditions have passed for a given file. This will then allow CAS-PGE to then use AutoDetectProductCrawler instead of StdProductCrawler, which will standardize across the board for file extraction (currently CAS-PGE has it's own file extraction interface which uses regular expression to determine files which should be extracted and ingested). The only missing feature in CAS-Crawler which CAS-PGE supports is file renaming, which this new NamingConvention interface will introduce. Here is what the NamingConvention interface will look like:
> {code}
> public interface NamingConvention {
> public File rename(File file, Metadata metadata)
> throws NamingConventionException;
> }
> {code}
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (OODT-426) Introduce a CAS-Metadata based
renaming interface
Posted by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/OODT-426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13234151#comment-13234151 ]
jiraposter@reviews.apache.org commented on OODT-426:
----------------------------------------------------
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/4412/#review6165
-----------------------------------------------------------
Ship it!
- Chris
On 2012-03-20 08:06:42, brian Foster wrote:
bq.
bq. -----------------------------------------------------------
bq. This is an automatically generated e-mail. To reply, visit:
bq. https://reviews.apache.org/r/4412/
bq. -----------------------------------------------------------
bq.
bq. (Updated 2012-03-20 08:06:42)
bq.
bq.
bq. Review request for oodt, Chris Mattmann, Ricky Nguyen, Paul Ramirez, and Thomas Bennett.
bq.
bq.
bq. Summary
bq. -------
bq.
bq. This is the CAS-Metadata part of this issue
bq.
bq.
bq. This addresses bug OODT-426.
bq. https://issues.apache.org/jira/browse/OODT-426
bq.
bq.
bq. Diffs
bq. -----
bq.
bq. trunk/metadata/src/main/java/org/apache/oodt/cas/metadata/exceptions/NamingConventionException.java PRE-CREATION
bq. trunk/metadata/src/main/java/org/apache/oodt/cas/metadata/filenaming/NamingConvention.java PRE-CREATION
bq. trunk/metadata/src/main/java/org/apache/oodt/cas/metadata/filenaming/PathUtilsNamingConvention.java PRE-CREATION
bq.
bq. Diff: https://reviews.apache.org/r/4412/diff
bq.
bq.
bq. Testing
bq. -------
bq.
bq. Still need to add unit-test for PathUtilsNamingConvention
bq.
bq.
bq. Thanks,
bq.
bq. brian
bq.
bq.
> Introduce a CAS-Metadata based renaming interface
> -------------------------------------------------
>
> Key: OODT-426
> URL: https://issues.apache.org/jira/browse/OODT-426
> Project: OODT
> Issue Type: Sub-task
> Components: crawler, metadata container, pge wrapper framework
> Affects Versions: 0.3
> Environment: none
> Reporter: Brian Foster
> Assignee: Brian Foster
> Priority: Minor
> Fix For: 0.4
>
> Attachments: OODT-426.2012-03-20.cas-metadata.patch.txt
>
>
> The idea here is that CAS-Metadata will introduce a new NamingConvention interface, which will allow for renaming of files. CAS-Crawler will then be modified to support specified NamingConventions which will be run after all preconditions have passed for a given file. This will then allow CAS-PGE to then use AutoDetectProductCrawler instead of StdProductCrawler, which will standardize across the board for file extraction (currently CAS-PGE has it's own file extraction interface which uses regular expression to determine files which should be extracted and ingested). The only missing feature in CAS-Crawler which CAS-PGE supports is file renaming, which this new NamingConvention interface will introduce. Here is what the NamingConvention interface will look like:
> {code}
> public interface NamingConvention {
> public File rename(File file, Metadata metadata)
> throws NamingConventionException;
> }
> {code}
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (OODT-426) Introduce a CAS-Metadata based
renaming interface
Posted by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/OODT-426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13236431#comment-13236431 ]
jiraposter@reviews.apache.org commented on OODT-426:
----------------------------------------------------
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/4444/#review6280
-----------------------------------------------------------
trunk/crawler/src/main/java/org/apache/oodt/cas/crawl/MetExtractorProductCrawler.java
<https://reviews.apache.org/r/4444/#comment13623>
Nope, it's backwards compatible... This exception is only thrown if you specify a naming convention ID which does not exist... If you don't specify the ID... which is what the peeps don't specify... Then the code will never reach this IF statement... The IF statement it is contained in will be false
- brian
On 2012-03-22 06:09:52, brian Foster wrote:
bq.
bq. -----------------------------------------------------------
bq. This is an automatically generated e-mail. To reply, visit:
bq. https://reviews.apache.org/r/4444/
bq. -----------------------------------------------------------
bq.
bq. (Updated 2012-03-22 06:09:52)
bq.
bq.
bq. Review request for oodt, Chris Mattmann, Ricky Nguyen, Paul Ramirez, and Thomas Bennett.
bq.
bq.
bq. Summary
bq. -------
bq.
bq. - Introduced NamingConvention support, which for MetExtractorProductCrawler now takes an ID for the NamingConvention to use and AutoDetectProductCrawler has a new element <namingConvention class=""/> in MimeExtractorRepo
bq. - Also cleaned up handleFile(File)... documented better and is now public and returns the IngestResult for what happened when called.
bq.
bq.
bq. This addresses bug OODT-426.
bq. https://issues.apache.org/jira/browse/OODT-426
bq.
bq.
bq. Diffs
bq. -----
bq.
bq. trunk/crawler/src/main/java/org/apache/oodt/cas/crawl/AutoDetectProductCrawler.java 1302790
bq. trunk/crawler/src/main/java/org/apache/oodt/cas/crawl/MetExtractorProductCrawler.java 1302790
bq. trunk/crawler/src/main/java/org/apache/oodt/cas/crawl/ProductCrawler.java 1302790
bq. trunk/crawler/src/main/java/org/apache/oodt/cas/crawl/StdProductCrawler.java 1302790
bq. trunk/crawler/src/main/java/org/apache/oodt/cas/crawl/typedetection/MimeExtractorConfigMetKeys.java 1302790
bq. trunk/crawler/src/main/java/org/apache/oodt/cas/crawl/typedetection/MimeExtractorConfigReader.java 1302790
bq. trunk/crawler/src/main/java/org/apache/oodt/cas/crawl/typedetection/MimeExtractorRepo.java 1302790
bq. trunk/crawler/src/main/resources/cmd-line-options.xml 1302790
bq. trunk/crawler/src/main/resources/crawler-config.xml 1302790
bq. trunk/crawler/src/main/resources/examples/mime-extractor-map.xml 1302790
bq. trunk/crawler/src/main/resources/naming-beans.xml PRE-CREATION
bq.
bq. Diff: https://reviews.apache.org/r/4444/diff
bq.
bq.
bq. Testing
bq. -------
bq.
bq. Still need to unit-test up cas-crawler
bq.
bq.
bq. Thanks,
bq.
bq. brian
bq.
bq.
> Introduce a CAS-Metadata based renaming interface
> -------------------------------------------------
>
> Key: OODT-426
> URL: https://issues.apache.org/jira/browse/OODT-426
> Project: OODT
> Issue Type: Sub-task
> Components: crawler, metadata container, pge wrapper framework
> Affects Versions: 0.3
> Environment: none
> Reporter: Brian Foster
> Assignee: Brian Foster
> Priority: Minor
> Fix For: 0.4
>
> Attachments: OODT-426.2012-03-20.cas-crawler.patch.txt, OODT-426.2012-03-20.cas-metadata.patch.txt
>
>
> The idea here is that CAS-Metadata will introduce a new NamingConvention interface, which will allow for renaming of files. CAS-Crawler will then be modified to support specified NamingConventions which will be run after all preconditions have passed for a given file. This will then allow CAS-PGE to then use AutoDetectProductCrawler instead of StdProductCrawler, which will standardize across the board for file extraction (currently CAS-PGE has it's own file extraction interface which uses regular expression to determine files which should be extracted and ingested). The only missing feature in CAS-Crawler which CAS-PGE supports is file renaming, which this new NamingConvention interface will introduce. Here is what the NamingConvention interface will look like:
> {code}
> public interface NamingConvention {
> public File rename(File file, Metadata metadata)
> throws NamingConventionException;
> }
> {code}
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (OODT-426) Introduce a CAS-Metadata based
renaming interface
Posted by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/OODT-426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13233292#comment-13233292 ]
jiraposter@reviews.apache.org commented on OODT-426:
----------------------------------------------------
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/4412/
-----------------------------------------------------------
Review request for oodt, Chris Mattmann, Ricky Nguyen, Paul Ramirez, and Thomas Bennett.
Summary
-------
This is the CAS-Metadata part of this issue
This addresses bug OODT-426.
https://issues.apache.org/jira/browse/OODT-426
Diffs
-----
trunk/metadata/src/main/java/org/apache/oodt/cas/metadata/exceptions/NamingConventionException.java PRE-CREATION
trunk/metadata/src/main/java/org/apache/oodt/cas/metadata/filenaming/NamingConvention.java PRE-CREATION
trunk/metadata/src/main/java/org/apache/oodt/cas/metadata/filenaming/PathUtilsNamingConvention.java PRE-CREATION
Diff: https://reviews.apache.org/r/4412/diff
Testing
-------
Still need to add unit-test for PathUtilsNamingConvention
Thanks,
brian
> Introduce a CAS-Metadata based renaming interface
> -------------------------------------------------
>
> Key: OODT-426
> URL: https://issues.apache.org/jira/browse/OODT-426
> Project: OODT
> Issue Type: Sub-task
> Components: crawler, metadata container, pge wrapper framework
> Affects Versions: 0.3
> Environment: none
> Reporter: Brian Foster
> Assignee: Brian Foster
> Priority: Minor
> Fix For: 0.4
>
> Attachments: OODT-426.2012-03-20.cas-metadata.patch.txt
>
>
> The idea here is that CAS-Metadata will introduce a new NamingConvention interface, which will allow for renaming of files. CAS-Crawler will then be modified to support specified NamingConventions which will be run after all preconditions have passed for a given file. This will then allow CAS-PGE to then use AutoDetectProductCrawler instead of StdProductCrawler, which will standardize across the board for file extraction (currently CAS-PGE has it's own file extraction interface which uses regular expression to determine files which should be extracted and ingested). The only missing feature in CAS-Crawler which CAS-PGE supports is file renaming, which this new NamingConvention interface will introduce. Here is what the NamingConvention interface will look like:
> {code}
> public interface NamingConvention {
> public File rename(File file, Metadata metadata)
> throws NamingConventionException;
> }
> {code}
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (OODT-426) Introduce a CAS-Metadata based
renaming interface
Posted by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/OODT-426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13248126#comment-13248126 ]
jiraposter@reviews.apache.org commented on OODT-426:
----------------------------------------------------
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/4628/#review6734
-----------------------------------------------------------
Ship it!
- Chris
On 2012-04-06 02:16:10, brian Foster wrote:
bq.
bq. -----------------------------------------------------------
bq. This is an automatically generated e-mail. To reply, visit:
bq. https://reviews.apache.org/r/4628/
bq. -----------------------------------------------------------
bq.
bq. (Updated 2012-04-06 02:16:10)
bq.
bq.
bq. Review request for oodt, Chris Mattmann, Ricky Nguyen, Paul Ramirez, and Thomas Bennett.
bq.
bq.
bq. Summary
bq. -------
bq.
bq. CAS-PGE Changes to this issue...
bq. - Renaming and Metadata extraction removed from CAS-PGE and instead CAS-PGE now uses AutoDetectProductCrawler instead of StdProductCrawler
bq.
bq.
bq. This addresses bug OODT-426.
bq. https://issues.apache.org/jira/browse/OODT-426
bq.
bq.
bq. Diffs
bq. -----
bq.
bq. trunk/pge/src/main/resources/examples/Crawler/naming-beans.xml PRE-CREATION
bq. trunk/pge/src/main/resources/examples/Crawler/precondition-beans.xml PRE-CREATION
bq. trunk/pge/src/main/resources/examples/MetadataOutputFiles/metadata-output.xml 1302648
bq. trunk/pge/src/main/resources/examples/PgeConfigFiles/pge-config.xml 1302648
bq. trunk/pge/src/test/org/apache/oodt/cas/pge/TestPGETaskInstance.java 1302781
bq. trunk/pge/src/main/resources/examples/Crawler/mime-types.xml PRE-CREATION
bq. trunk/pge/src/main/resources/examples/Crawler/mime-extractor-map.xml PRE-CREATION
bq. trunk/pge/src/main/resources/examples/Crawler/crawler-config.xml PRE-CREATION
bq. trunk/pge/src/main/resources/examples/Crawler/filename.extractor.config.xml PRE-CREATION
bq. trunk/pge/src/main/resources/examples/Crawler/action-beans.xml PRE-CREATION
bq. trunk/pge/src/main/resources/examples/Crawler/action-beans.properties PRE-CREATION
bq. trunk/pge/src/main/java/org/apache/oodt/cas/pge/writers/metlist/MetadataListPcsMetFileWriter.java 1302648
bq. trunk/pge/src/main/java/org/apache/oodt/cas/pge/writers/xslt/XslTransformWriter.java 1302648
bq. trunk/pge/src/main/java/org/apache/oodt/cas/pge/writers/SciPgeConfigFileWriter.java 1302648
bq. trunk/pge/src/main/java/org/apache/oodt/cas/pge/config/XmlFilePgeConfigBuilder.java 1302648
bq. trunk/pge/src/main/java/org/apache/oodt/cas/pge/metadata/PgeTaskMetKeys.java 1302648
bq. trunk/pge/src/main/java/org/apache/oodt/cas/pge/writers/ExternExtractorMetWriter.java 1302648
bq. trunk/pge/src/main/java/org/apache/oodt/cas/pge/writers/FilenameExtractorWriter.java 1302648
bq. trunk/pge/src/main/java/org/apache/oodt/cas/pge/writers/PcsMetFileWriter.java 1302648
bq. trunk/pge/src/main/java/org/apache/oodt/cas/pge/config/RenamingConv.java 1302648
bq. trunk/pge/src/main/java/org/apache/oodt/cas/pge/config/PgeConfigBuilder.java 1302648
bq. trunk/pge/src/main/java/org/apache/oodt/cas/pge/config/PgeConfigMetKeys.java 1302648
bq. trunk/pge/src/main/java/org/apache/oodt/cas/pge/config/RegExprOutputFiles.java 1302648
bq. trunk/pge/src/main/java/org/apache/oodt/cas/pge/config/OutputDir.java 1302648
bq. trunk/pge/src/main/java/org/apache/oodt/cas/pge/config/PgeConfig.java 1302648
bq. trunk/pge/pom.xml 1302648
bq. trunk/pge/src/main/java/org/apache/oodt/cas/pge/PGETaskInstance.java 1302648
bq.
bq. Diff: https://reviews.apache.org/r/4628/diff
bq.
bq.
bq. Testing
bq. -------
bq.
bq. Several Unit-tests
bq.
bq.
bq. Thanks,
bq.
bq. brian
bq.
bq.
> Introduce a CAS-Metadata based renaming interface
> -------------------------------------------------
>
> Key: OODT-426
> URL: https://issues.apache.org/jira/browse/OODT-426
> Project: OODT
> Issue Type: Sub-task
> Components: crawler, metadata container, pge wrapper framework
> Affects Versions: 0.3
> Environment: none
> Reporter: Brian Foster
> Assignee: Brian Foster
> Priority: Minor
> Fix For: 0.5
>
> Attachments: OODT-426.2012-03-20.cas-crawler.patch.txt, OODT-426.2012-03-20.cas-metadata.patch.txt, OODT-426.2012-03-24.cas-crawler.patch.txt, OODT-426.2012-04-03.cas-pge.txt
>
>
> The idea here is that CAS-Metadata will introduce a new NamingConvention interface, which will allow for renaming of files. CAS-Crawler will then be modified to support specified NamingConventions which will be run after all preconditions have passed for a given file. This will then allow CAS-PGE to then use AutoDetectProductCrawler instead of StdProductCrawler, which will standardize across the board for file extraction (currently CAS-PGE has it's own file extraction interface which uses regular expression to determine files which should be extracted and ingested). The only missing feature in CAS-Crawler which CAS-PGE supports is file renaming, which this new NamingConvention interface will introduce. Here is what the NamingConvention interface will look like:
> {code}
> public interface NamingConvention {
> public File rename(File file, Metadata metadata)
> throws NamingConventionException;
> }
> {code}
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (OODT-426) Introduce a CAS-Metadata based renaming
interface
Posted by "Brian Foster (Updated) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/OODT-426?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Brian Foster updated OODT-426:
------------------------------
Attachment: OODT-426.2012-03-24.cas-crawler.patch.txt
- updated with some unit-tests... almost done
> Introduce a CAS-Metadata based renaming interface
> -------------------------------------------------
>
> Key: OODT-426
> URL: https://issues.apache.org/jira/browse/OODT-426
> Project: OODT
> Issue Type: Sub-task
> Components: crawler, metadata container, pge wrapper framework
> Affects Versions: 0.3
> Environment: none
> Reporter: Brian Foster
> Assignee: Brian Foster
> Priority: Minor
> Fix For: 0.4
>
> Attachments: OODT-426.2012-03-20.cas-crawler.patch.txt, OODT-426.2012-03-20.cas-metadata.patch.txt, OODT-426.2012-03-24.cas-crawler.patch.txt
>
>
> The idea here is that CAS-Metadata will introduce a new NamingConvention interface, which will allow for renaming of files. CAS-Crawler will then be modified to support specified NamingConventions which will be run after all preconditions have passed for a given file. This will then allow CAS-PGE to then use AutoDetectProductCrawler instead of StdProductCrawler, which will standardize across the board for file extraction (currently CAS-PGE has it's own file extraction interface which uses regular expression to determine files which should be extracted and ingested). The only missing feature in CAS-Crawler which CAS-PGE supports is file renaming, which this new NamingConvention interface will introduce. Here is what the NamingConvention interface will look like:
> {code}
> public interface NamingConvention {
> public File rename(File file, Metadata metadata)
> throws NamingConventionException;
> }
> {code}
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (OODT-426) Introduce a CAS-Metadata based
renaming interface
Posted by "Brian Foster (Commented) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/OODT-426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13239258#comment-13239258 ]
Brian Foster commented on OODT-426:
-----------------------------------
- fixed cas-metadata part of patch in r1305744
> Introduce a CAS-Metadata based renaming interface
> -------------------------------------------------
>
> Key: OODT-426
> URL: https://issues.apache.org/jira/browse/OODT-426
> Project: OODT
> Issue Type: Sub-task
> Components: crawler, metadata container, pge wrapper framework
> Affects Versions: 0.3
> Environment: none
> Reporter: Brian Foster
> Assignee: Brian Foster
> Priority: Minor
> Fix For: 0.4
>
> Attachments: OODT-426.2012-03-20.cas-crawler.patch.txt, OODT-426.2012-03-20.cas-metadata.patch.txt, OODT-426.2012-03-24.cas-crawler.patch.txt
>
>
> The idea here is that CAS-Metadata will introduce a new NamingConvention interface, which will allow for renaming of files. CAS-Crawler will then be modified to support specified NamingConventions which will be run after all preconditions have passed for a given file. This will then allow CAS-PGE to then use AutoDetectProductCrawler instead of StdProductCrawler, which will standardize across the board for file extraction (currently CAS-PGE has it's own file extraction interface which uses regular expression to determine files which should be extracted and ingested). The only missing feature in CAS-Crawler which CAS-PGE supports is file renaming, which this new NamingConvention interface will introduce. Here is what the NamingConvention interface will look like:
> {code}
> public interface NamingConvention {
> public File rename(File file, Metadata metadata)
> throws NamingConventionException;
> }
> {code}
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (OODT-426) Introduce a CAS-Metadata based
renaming interface
Posted by "Brian Foster (Commented) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/OODT-426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13239259#comment-13239259 ]
Brian Foster commented on OODT-426:
-----------------------------------
- fixed cas-crawler part of patch in r1305745
> Introduce a CAS-Metadata based renaming interface
> -------------------------------------------------
>
> Key: OODT-426
> URL: https://issues.apache.org/jira/browse/OODT-426
> Project: OODT
> Issue Type: Sub-task
> Components: crawler, metadata container, pge wrapper framework
> Affects Versions: 0.3
> Environment: none
> Reporter: Brian Foster
> Assignee: Brian Foster
> Priority: Minor
> Fix For: 0.4
>
> Attachments: OODT-426.2012-03-20.cas-crawler.patch.txt, OODT-426.2012-03-20.cas-metadata.patch.txt, OODT-426.2012-03-24.cas-crawler.patch.txt
>
>
> The idea here is that CAS-Metadata will introduce a new NamingConvention interface, which will allow for renaming of files. CAS-Crawler will then be modified to support specified NamingConventions which will be run after all preconditions have passed for a given file. This will then allow CAS-PGE to then use AutoDetectProductCrawler instead of StdProductCrawler, which will standardize across the board for file extraction (currently CAS-PGE has it's own file extraction interface which uses regular expression to determine files which should be extracted and ingested). The only missing feature in CAS-Crawler which CAS-PGE supports is file renaming, which this new NamingConvention interface will introduce. Here is what the NamingConvention interface will look like:
> {code}
> public interface NamingConvention {
> public File rename(File file, Metadata metadata)
> throws NamingConventionException;
> }
> {code}
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (OODT-426) Introduce a CAS-Metadata based
renaming interface
Posted by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/OODT-426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13250236#comment-13250236 ]
jiraposter@reviews.apache.org commented on OODT-426:
----------------------------------------------------
bq. On 2012-04-04 02:12:41, Paul Ramirez wrote:
bq. > trunk/pge/src/main/resources/examples/PgeConfigFiles/pge-config.xml, lines 42-43
bq. > <https://reviews.apache.org/r/4628/diff/1/?file=98813#file98813line42>
bq. >
bq. > Put these examples inside comment tags as they wouldn't work as they existed anyhow. Also putting a longer description in the comment would help (i.e. one or more of these is not as helpful as what it does functionally. Why did we remove the files tag? Is this no longer supported? If it is then I recommend putting it back in but commented out.
bq. >
bq. > For instance, I'd expect that instead of metadata keys you want to set more of what will be done with that custom metadata would be of use. Also an example of multivalued metadata.
bq.
bq. brian Foster wrote:
bq. Added a TODO at the top of this file... The reader for this file still needs to be updated... so when i update it i'll make this file a working example when i write the unit-tests for it
Also the file tags are no longer supported... use AutoDetectProductCrawler configuration now to specify which files in the outputDirs should be ingested
- brian
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/4628/#review6670
-----------------------------------------------------------
On 2012-04-06 02:16:10, brian Foster wrote:
bq.
bq. -----------------------------------------------------------
bq. This is an automatically generated e-mail. To reply, visit:
bq. https://reviews.apache.org/r/4628/
bq. -----------------------------------------------------------
bq.
bq. (Updated 2012-04-06 02:16:10)
bq.
bq.
bq. Review request for oodt, Chris Mattmann, Ricky Nguyen, Paul Ramirez, and Thomas Bennett.
bq.
bq.
bq. Summary
bq. -------
bq.
bq. CAS-PGE Changes to this issue...
bq. - Renaming and Metadata extraction removed from CAS-PGE and instead CAS-PGE now uses AutoDetectProductCrawler instead of StdProductCrawler
bq.
bq.
bq. This addresses bug OODT-426.
bq. https://issues.apache.org/jira/browse/OODT-426
bq.
bq.
bq. Diffs
bq. -----
bq.
bq. trunk/pge/src/main/resources/examples/Crawler/naming-beans.xml PRE-CREATION
bq. trunk/pge/src/main/resources/examples/Crawler/precondition-beans.xml PRE-CREATION
bq. trunk/pge/src/main/resources/examples/MetadataOutputFiles/metadata-output.xml 1302648
bq. trunk/pge/src/main/resources/examples/PgeConfigFiles/pge-config.xml 1302648
bq. trunk/pge/src/test/org/apache/oodt/cas/pge/TestPGETaskInstance.java 1302781
bq. trunk/pge/src/main/resources/examples/Crawler/mime-types.xml PRE-CREATION
bq. trunk/pge/src/main/resources/examples/Crawler/mime-extractor-map.xml PRE-CREATION
bq. trunk/pge/src/main/resources/examples/Crawler/crawler-config.xml PRE-CREATION
bq. trunk/pge/src/main/resources/examples/Crawler/filename.extractor.config.xml PRE-CREATION
bq. trunk/pge/src/main/resources/examples/Crawler/action-beans.xml PRE-CREATION
bq. trunk/pge/src/main/resources/examples/Crawler/action-beans.properties PRE-CREATION
bq. trunk/pge/src/main/java/org/apache/oodt/cas/pge/writers/metlist/MetadataListPcsMetFileWriter.java 1302648
bq. trunk/pge/src/main/java/org/apache/oodt/cas/pge/writers/xslt/XslTransformWriter.java 1302648
bq. trunk/pge/src/main/java/org/apache/oodt/cas/pge/writers/SciPgeConfigFileWriter.java 1302648
bq. trunk/pge/src/main/java/org/apache/oodt/cas/pge/config/XmlFilePgeConfigBuilder.java 1302648
bq. trunk/pge/src/main/java/org/apache/oodt/cas/pge/metadata/PgeTaskMetKeys.java 1302648
bq. trunk/pge/src/main/java/org/apache/oodt/cas/pge/writers/ExternExtractorMetWriter.java 1302648
bq. trunk/pge/src/main/java/org/apache/oodt/cas/pge/writers/FilenameExtractorWriter.java 1302648
bq. trunk/pge/src/main/java/org/apache/oodt/cas/pge/writers/PcsMetFileWriter.java 1302648
bq. trunk/pge/src/main/java/org/apache/oodt/cas/pge/config/RenamingConv.java 1302648
bq. trunk/pge/src/main/java/org/apache/oodt/cas/pge/config/PgeConfigBuilder.java 1302648
bq. trunk/pge/src/main/java/org/apache/oodt/cas/pge/config/PgeConfigMetKeys.java 1302648
bq. trunk/pge/src/main/java/org/apache/oodt/cas/pge/config/RegExprOutputFiles.java 1302648
bq. trunk/pge/src/main/java/org/apache/oodt/cas/pge/config/OutputDir.java 1302648
bq. trunk/pge/src/main/java/org/apache/oodt/cas/pge/config/PgeConfig.java 1302648
bq. trunk/pge/pom.xml 1302648
bq. trunk/pge/src/main/java/org/apache/oodt/cas/pge/PGETaskInstance.java 1302648
bq.
bq. Diff: https://reviews.apache.org/r/4628/diff
bq.
bq.
bq. Testing
bq. -------
bq.
bq. Several Unit-tests
bq.
bq.
bq. Thanks,
bq.
bq. brian
bq.
bq.
> Introduce a CAS-Metadata based renaming interface
> -------------------------------------------------
>
> Key: OODT-426
> URL: https://issues.apache.org/jira/browse/OODT-426
> Project: OODT
> Issue Type: Sub-task
> Components: crawler, metadata container, pge wrapper framework
> Affects Versions: 0.3
> Environment: none
> Reporter: Brian Foster
> Assignee: Brian Foster
> Priority: Minor
> Fix For: 0.5
>
> Attachments: OODT-426.2012-03-20.cas-crawler.patch.txt, OODT-426.2012-03-20.cas-metadata.patch.txt, OODT-426.2012-03-24.cas-crawler.patch.txt, OODT-426.2012-04-03.cas-pge.txt
>
>
> The idea here is that CAS-Metadata will introduce a new NamingConvention interface, which will allow for renaming of files. CAS-Crawler will then be modified to support specified NamingConventions which will be run after all preconditions have passed for a given file. This will then allow CAS-PGE to then use AutoDetectProductCrawler instead of StdProductCrawler, which will standardize across the board for file extraction (currently CAS-PGE has it's own file extraction interface which uses regular expression to determine files which should be extracted and ingested). The only missing feature in CAS-Crawler which CAS-PGE supports is file renaming, which this new NamingConvention interface will introduce. Here is what the NamingConvention interface will look like:
> {code}
> public interface NamingConvention {
> public File rename(File file, Metadata metadata)
> throws NamingConventionException;
> }
> {code}
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (OODT-426) Introduce a CAS-Metadata based
renaming interface
Posted by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/OODT-426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13245965#comment-13245965 ]
jiraposter@reviews.apache.org commented on OODT-426:
----------------------------------------------------
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/4628/#review6670
-----------------------------------------------------------
trunk/pge/src/main/resources/examples/Crawler/action-beans.xml
<https://reviews.apache.org/r/4628/#comment14432>
I'd define these properties in another file and then include them here. This is only a suggestion and not a just but I see the properties as something that could likely be changed or set to a fixed value and if we factor it out of here we can keep people from touching this file too much. I think this file just makes peoples heads spin at first but the properties don't (i.e. it hides the Spring goodness in a good way).
trunk/pge/src/main/resources/examples/PgeConfigFiles/pge-config.xml
<https://reviews.apache.org/r/4628/#comment14431>
Put these examples inside comment tags as they wouldn't work as they existed anyhow. Also putting a longer description in the comment would help (i.e. one or more of these is not as helpful as what it does functionally. Why did we remove the files tag? Is this no longer supported? If it is then I recommend putting it back in but commented out.
For instance, I'd expect that instead of metadata keys you want to set more of what will be done with that custom metadata would be of use. Also an example of multivalued metadata.
- Paul
On 2012-04-03 21:56:17, brian Foster wrote:
bq.
bq. -----------------------------------------------------------
bq. This is an automatically generated e-mail. To reply, visit:
bq. https://reviews.apache.org/r/4628/
bq. -----------------------------------------------------------
bq.
bq. (Updated 2012-04-03 21:56:17)
bq.
bq.
bq. Review request for oodt, Chris Mattmann, Ricky Nguyen, Paul Ramirez, and Thomas Bennett.
bq.
bq.
bq. Summary
bq. -------
bq.
bq. CAS-PGE Changes to this issue...
bq. - Renaming and Metadata extraction removed from CAS-PGE and instead CAS-PGE now uses AutoDetectProductCrawler instead of StdProductCrawler
bq.
bq.
bq. This addresses bug OODT-426.
bq. https://issues.apache.org/jira/browse/OODT-426
bq.
bq.
bq. Diffs
bq. -----
bq.
bq. trunk/pge/pom.xml 1302648
bq. trunk/pge/src/main/java/org/apache/oodt/cas/pge/PGETaskInstance.java 1302648
bq. trunk/pge/src/main/java/org/apache/oodt/cas/pge/config/OutputDir.java 1302648
bq. trunk/pge/src/main/java/org/apache/oodt/cas/pge/config/PgeConfig.java 1302648
bq. trunk/pge/src/main/java/org/apache/oodt/cas/pge/config/PgeConfigBuilder.java 1302648
bq. trunk/pge/src/main/java/org/apache/oodt/cas/pge/config/PgeConfigMetKeys.java 1302648
bq. trunk/pge/src/main/java/org/apache/oodt/cas/pge/config/RegExprOutputFiles.java 1302648
bq. trunk/pge/src/main/java/org/apache/oodt/cas/pge/config/RenamingConv.java 1302648
bq. trunk/pge/src/main/java/org/apache/oodt/cas/pge/config/XmlFilePgeConfigBuilder.java 1302648
bq. trunk/pge/src/main/java/org/apache/oodt/cas/pge/metadata/PgeTaskMetKeys.java 1302648
bq. trunk/pge/src/main/java/org/apache/oodt/cas/pge/writers/ExternExtractorMetWriter.java 1302648
bq. trunk/pge/src/main/java/org/apache/oodt/cas/pge/writers/FilenameExtractorWriter.java 1302648
bq. trunk/pge/src/main/java/org/apache/oodt/cas/pge/writers/PcsMetFileWriter.java 1302648
bq. trunk/pge/src/main/java/org/apache/oodt/cas/pge/writers/SciPgeConfigFileWriter.java 1302648
bq. trunk/pge/src/main/java/org/apache/oodt/cas/pge/writers/metlist/MetadataListPcsMetFileWriter.java 1302648
bq. trunk/pge/src/main/java/org/apache/oodt/cas/pge/writers/xslt/XslTransformWriter.java 1302648
bq. trunk/pge/src/main/resources/examples/Crawler/action-beans.xml PRE-CREATION
bq. trunk/pge/src/main/resources/examples/Crawler/crawler-config.xml PRE-CREATION
bq. trunk/pge/src/main/resources/examples/Crawler/mime-extractor-map.xml PRE-CREATION
bq. trunk/pge/src/main/resources/examples/Crawler/mime-types.xml PRE-CREATION
bq. trunk/pge/src/main/resources/examples/Crawler/naming-beans.xml PRE-CREATION
bq. trunk/pge/src/main/resources/examples/Crawler/precondition-beans.xml PRE-CREATION
bq. trunk/pge/src/main/resources/examples/MetadataOutputFiles/metadata-output.xml 1302648
bq. trunk/pge/src/main/resources/examples/PgeConfigFiles/pge-config.xml 1302648
bq. trunk/pge/src/test/org/apache/oodt/cas/pge/TestPGETaskInstance.java 1302781
bq.
bq. Diff: https://reviews.apache.org/r/4628/diff
bq.
bq.
bq. Testing
bq. -------
bq.
bq. Several Unit-tests
bq.
bq.
bq. Thanks,
bq.
bq. brian
bq.
bq.
> Introduce a CAS-Metadata based renaming interface
> -------------------------------------------------
>
> Key: OODT-426
> URL: https://issues.apache.org/jira/browse/OODT-426
> Project: OODT
> Issue Type: Sub-task
> Components: crawler, metadata container, pge wrapper framework
> Affects Versions: 0.3
> Environment: none
> Reporter: Brian Foster
> Assignee: Brian Foster
> Priority: Minor
> Fix For: 0.5
>
> Attachments: OODT-426.2012-03-20.cas-crawler.patch.txt, OODT-426.2012-03-20.cas-metadata.patch.txt, OODT-426.2012-03-24.cas-crawler.patch.txt, OODT-426.2012-04-03.cas-pge.txt
>
>
> The idea here is that CAS-Metadata will introduce a new NamingConvention interface, which will allow for renaming of files. CAS-Crawler will then be modified to support specified NamingConventions which will be run after all preconditions have passed for a given file. This will then allow CAS-PGE to then use AutoDetectProductCrawler instead of StdProductCrawler, which will standardize across the board for file extraction (currently CAS-PGE has it's own file extraction interface which uses regular expression to determine files which should be extracted and ingested). The only missing feature in CAS-Crawler which CAS-PGE supports is file renaming, which this new NamingConvention interface will introduce. Here is what the NamingConvention interface will look like:
> {code}
> public interface NamingConvention {
> public File rename(File file, Metadata metadata)
> throws NamingConventionException;
> }
> {code}
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (OODT-426) Introduce a CAS-Metadata based renaming
interface
Posted by "Brian Foster (Updated) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/OODT-426?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Brian Foster updated OODT-426:
------------------------------
Attachment: OODT-426.2012-04-03.cas-pge.txt
- attached patch (cas-pge changes)
> Introduce a CAS-Metadata based renaming interface
> -------------------------------------------------
>
> Key: OODT-426
> URL: https://issues.apache.org/jira/browse/OODT-426
> Project: OODT
> Issue Type: Sub-task
> Components: crawler, metadata container, pge wrapper framework
> Affects Versions: 0.3
> Environment: none
> Reporter: Brian Foster
> Assignee: Brian Foster
> Priority: Minor
> Fix For: 0.5
>
> Attachments: OODT-426.2012-03-20.cas-crawler.patch.txt, OODT-426.2012-03-20.cas-metadata.patch.txt, OODT-426.2012-03-24.cas-crawler.patch.txt, OODT-426.2012-04-03.cas-pge.txt
>
>
> The idea here is that CAS-Metadata will introduce a new NamingConvention interface, which will allow for renaming of files. CAS-Crawler will then be modified to support specified NamingConventions which will be run after all preconditions have passed for a given file. This will then allow CAS-PGE to then use AutoDetectProductCrawler instead of StdProductCrawler, which will standardize across the board for file extraction (currently CAS-PGE has it's own file extraction interface which uses regular expression to determine files which should be extracted and ingested). The only missing feature in CAS-Crawler which CAS-PGE supports is file renaming, which this new NamingConvention interface will introduce. Here is what the NamingConvention interface will look like:
> {code}
> public interface NamingConvention {
> public File rename(File file, Metadata metadata)
> throws NamingConventionException;
> }
> {code}
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (OODT-426) Introduce a CAS-Metadata based
renaming interface
Posted by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/OODT-426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13235391#comment-13235391 ]
jiraposter@reviews.apache.org commented on OODT-426:
----------------------------------------------------
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/4444/
-----------------------------------------------------------
Review request for oodt, Chris Mattmann, Ricky Nguyen, Paul Ramirez, and Thomas Bennett.
Summary
-------
- Introduced NamingConvention support, which for MetExtractorProductCrawler now takes an ID for the NamingConvention to use and AutoDetectProductCrawler has a new element <namingConvention class=""/> in MimeExtractorRepo
- Also cleaned up handleFile(File)... documented better and is now public and returns the IngestResult for what happened when called.
This addresses bug OODT-426.
https://issues.apache.org/jira/browse/OODT-426
Diffs
-----
trunk/crawler/src/main/java/org/apache/oodt/cas/crawl/AutoDetectProductCrawler.java 1302790
trunk/crawler/src/main/java/org/apache/oodt/cas/crawl/MetExtractorProductCrawler.java 1302790
trunk/crawler/src/main/java/org/apache/oodt/cas/crawl/ProductCrawler.java 1302790
trunk/crawler/src/main/java/org/apache/oodt/cas/crawl/StdProductCrawler.java 1302790
trunk/crawler/src/main/java/org/apache/oodt/cas/crawl/typedetection/MimeExtractorConfigMetKeys.java 1302790
trunk/crawler/src/main/java/org/apache/oodt/cas/crawl/typedetection/MimeExtractorConfigReader.java 1302790
trunk/crawler/src/main/java/org/apache/oodt/cas/crawl/typedetection/MimeExtractorRepo.java 1302790
trunk/crawler/src/main/resources/cmd-line-options.xml 1302790
trunk/crawler/src/main/resources/crawler-config.xml 1302790
trunk/crawler/src/main/resources/examples/mime-extractor-map.xml 1302790
trunk/crawler/src/main/resources/naming-beans.xml PRE-CREATION
Diff: https://reviews.apache.org/r/4444/diff
Testing
-------
Still need to unit-test up cas-crawler
Thanks,
brian
> Introduce a CAS-Metadata based renaming interface
> -------------------------------------------------
>
> Key: OODT-426
> URL: https://issues.apache.org/jira/browse/OODT-426
> Project: OODT
> Issue Type: Sub-task
> Components: crawler, metadata container, pge wrapper framework
> Affects Versions: 0.3
> Environment: none
> Reporter: Brian Foster
> Assignee: Brian Foster
> Priority: Minor
> Fix For: 0.4
>
> Attachments: OODT-426.2012-03-20.cas-crawler.patch.txt, OODT-426.2012-03-20.cas-metadata.patch.txt
>
>
> The idea here is that CAS-Metadata will introduce a new NamingConvention interface, which will allow for renaming of files. CAS-Crawler will then be modified to support specified NamingConventions which will be run after all preconditions have passed for a given file. This will then allow CAS-PGE to then use AutoDetectProductCrawler instead of StdProductCrawler, which will standardize across the board for file extraction (currently CAS-PGE has it's own file extraction interface which uses regular expression to determine files which should be extracted and ingested). The only missing feature in CAS-Crawler which CAS-PGE supports is file renaming, which this new NamingConvention interface will introduce. Here is what the NamingConvention interface will look like:
> {code}
> public interface NamingConvention {
> public File rename(File file, Metadata metadata)
> throws NamingConventionException;
> }
> {code}
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (OODT-426) Introduce a CAS-Metadata based
renaming interface
Posted by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/OODT-426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13247970#comment-13247970 ]
jiraposter@reviews.apache.org commented on OODT-426:
----------------------------------------------------
bq. On 2012-04-04 02:12:41, Paul Ramirez wrote:
bq. > trunk/pge/src/main/resources/examples/Crawler/action-beans.xml, lines 29-37
bq. > <https://reviews.apache.org/r/4628/diff/1/?file=98806#file98806line29>
bq. >
bq. > I'd define these properties in another file and then include them here. This is only a suggestion and not a just but I see the properties as something that could likely be changed or set to a fixed value and if we factor it out of here we can keep people from touching this file too much. I think this file just makes peoples heads spin at first but the properties don't (i.e. it hides the Spring goodness in a good way).
done
bq. On 2012-04-04 02:12:41, Paul Ramirez wrote:
bq. > trunk/pge/src/main/resources/examples/PgeConfigFiles/pge-config.xml, lines 42-43
bq. > <https://reviews.apache.org/r/4628/diff/1/?file=98813#file98813line42>
bq. >
bq. > Put these examples inside comment tags as they wouldn't work as they existed anyhow. Also putting a longer description in the comment would help (i.e. one or more of these is not as helpful as what it does functionally. Why did we remove the files tag? Is this no longer supported? If it is then I recommend putting it back in but commented out.
bq. >
bq. > For instance, I'd expect that instead of metadata keys you want to set more of what will be done with that custom metadata would be of use. Also an example of multivalued metadata.
Added a TODO at the top of this file... The reader for this file still needs to be updated... so when i update it i'll make this file a working example when i write the unit-tests for it
- brian
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/4628/#review6670
-----------------------------------------------------------
On 2012-04-03 21:56:17, brian Foster wrote:
bq.
bq. -----------------------------------------------------------
bq. This is an automatically generated e-mail. To reply, visit:
bq. https://reviews.apache.org/r/4628/
bq. -----------------------------------------------------------
bq.
bq. (Updated 2012-04-03 21:56:17)
bq.
bq.
bq. Review request for oodt, Chris Mattmann, Ricky Nguyen, Paul Ramirez, and Thomas Bennett.
bq.
bq.
bq. Summary
bq. -------
bq.
bq. CAS-PGE Changes to this issue...
bq. - Renaming and Metadata extraction removed from CAS-PGE and instead CAS-PGE now uses AutoDetectProductCrawler instead of StdProductCrawler
bq.
bq.
bq. This addresses bug OODT-426.
bq. https://issues.apache.org/jira/browse/OODT-426
bq.
bq.
bq. Diffs
bq. -----
bq.
bq. trunk/pge/pom.xml 1302648
bq. trunk/pge/src/main/java/org/apache/oodt/cas/pge/PGETaskInstance.java 1302648
bq. trunk/pge/src/main/java/org/apache/oodt/cas/pge/config/OutputDir.java 1302648
bq. trunk/pge/src/main/java/org/apache/oodt/cas/pge/config/PgeConfig.java 1302648
bq. trunk/pge/src/main/java/org/apache/oodt/cas/pge/config/PgeConfigBuilder.java 1302648
bq. trunk/pge/src/main/java/org/apache/oodt/cas/pge/config/PgeConfigMetKeys.java 1302648
bq. trunk/pge/src/main/java/org/apache/oodt/cas/pge/config/RegExprOutputFiles.java 1302648
bq. trunk/pge/src/main/java/org/apache/oodt/cas/pge/config/RenamingConv.java 1302648
bq. trunk/pge/src/main/java/org/apache/oodt/cas/pge/config/XmlFilePgeConfigBuilder.java 1302648
bq. trunk/pge/src/main/java/org/apache/oodt/cas/pge/metadata/PgeTaskMetKeys.java 1302648
bq. trunk/pge/src/main/java/org/apache/oodt/cas/pge/writers/ExternExtractorMetWriter.java 1302648
bq. trunk/pge/src/main/java/org/apache/oodt/cas/pge/writers/FilenameExtractorWriter.java 1302648
bq. trunk/pge/src/main/java/org/apache/oodt/cas/pge/writers/PcsMetFileWriter.java 1302648
bq. trunk/pge/src/main/java/org/apache/oodt/cas/pge/writers/SciPgeConfigFileWriter.java 1302648
bq. trunk/pge/src/main/java/org/apache/oodt/cas/pge/writers/metlist/MetadataListPcsMetFileWriter.java 1302648
bq. trunk/pge/src/main/java/org/apache/oodt/cas/pge/writers/xslt/XslTransformWriter.java 1302648
bq. trunk/pge/src/main/resources/examples/Crawler/action-beans.xml PRE-CREATION
bq. trunk/pge/src/main/resources/examples/Crawler/crawler-config.xml PRE-CREATION
bq. trunk/pge/src/main/resources/examples/Crawler/mime-extractor-map.xml PRE-CREATION
bq. trunk/pge/src/main/resources/examples/Crawler/mime-types.xml PRE-CREATION
bq. trunk/pge/src/main/resources/examples/Crawler/naming-beans.xml PRE-CREATION
bq. trunk/pge/src/main/resources/examples/Crawler/precondition-beans.xml PRE-CREATION
bq. trunk/pge/src/main/resources/examples/MetadataOutputFiles/metadata-output.xml 1302648
bq. trunk/pge/src/main/resources/examples/PgeConfigFiles/pge-config.xml 1302648
bq. trunk/pge/src/test/org/apache/oodt/cas/pge/TestPGETaskInstance.java 1302781
bq.
bq. Diff: https://reviews.apache.org/r/4628/diff
bq.
bq.
bq. Testing
bq. -------
bq.
bq. Several Unit-tests
bq.
bq.
bq. Thanks,
bq.
bq. brian
bq.
bq.
> Introduce a CAS-Metadata based renaming interface
> -------------------------------------------------
>
> Key: OODT-426
> URL: https://issues.apache.org/jira/browse/OODT-426
> Project: OODT
> Issue Type: Sub-task
> Components: crawler, metadata container, pge wrapper framework
> Affects Versions: 0.3
> Environment: none
> Reporter: Brian Foster
> Assignee: Brian Foster
> Priority: Minor
> Fix For: 0.5
>
> Attachments: OODT-426.2012-03-20.cas-crawler.patch.txt, OODT-426.2012-03-20.cas-metadata.patch.txt, OODT-426.2012-03-24.cas-crawler.patch.txt, OODT-426.2012-04-03.cas-pge.txt
>
>
> The idea here is that CAS-Metadata will introduce a new NamingConvention interface, which will allow for renaming of files. CAS-Crawler will then be modified to support specified NamingConventions which will be run after all preconditions have passed for a given file. This will then allow CAS-PGE to then use AutoDetectProductCrawler instead of StdProductCrawler, which will standardize across the board for file extraction (currently CAS-PGE has it's own file extraction interface which uses regular expression to determine files which should be extracted and ingested). The only missing feature in CAS-Crawler which CAS-PGE supports is file renaming, which this new NamingConvention interface will introduce. Here is what the NamingConvention interface will look like:
> {code}
> public interface NamingConvention {
> public File rename(File file, Metadata metadata)
> throws NamingConventionException;
> }
> {code}
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (OODT-426) Introduce a CAS-Metadata based
renaming interface
Posted by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/OODT-426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13246563#comment-13246563 ]
jiraposter@reviews.apache.org commented on OODT-426:
----------------------------------------------------
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/4628/#review6686
-----------------------------------------------------------
trunk/pge/src/main/java/org/apache/oodt/cas/pge/PGETaskInstance.java
<https://reviews.apache.org/r/4628/#comment14489>
this seems like an ancillary change to this patch. However, it's a useful functionality so I don't feel strongly about separating it out. Just be wary of stuff like this (b/c as it grows) it can take away from the purpose of the patch ;)
trunk/pge/src/main/java/org/apache/oodt/cas/pge/config/OutputDir.java
<https://reviews.apache.org/r/4628/#comment14496>
+like
- Chris
On 2012-04-03 21:56:17, brian Foster wrote:
bq.
bq. -----------------------------------------------------------
bq. This is an automatically generated e-mail. To reply, visit:
bq. https://reviews.apache.org/r/4628/
bq. -----------------------------------------------------------
bq.
bq. (Updated 2012-04-03 21:56:17)
bq.
bq.
bq. Review request for oodt, Chris Mattmann, Ricky Nguyen, Paul Ramirez, and Thomas Bennett.
bq.
bq.
bq. Summary
bq. -------
bq.
bq. CAS-PGE Changes to this issue...
bq. - Renaming and Metadata extraction removed from CAS-PGE and instead CAS-PGE now uses AutoDetectProductCrawler instead of StdProductCrawler
bq.
bq.
bq. This addresses bug OODT-426.
bq. https://issues.apache.org/jira/browse/OODT-426
bq.
bq.
bq. Diffs
bq. -----
bq.
bq. trunk/pge/pom.xml 1302648
bq. trunk/pge/src/main/java/org/apache/oodt/cas/pge/PGETaskInstance.java 1302648
bq. trunk/pge/src/main/java/org/apache/oodt/cas/pge/config/OutputDir.java 1302648
bq. trunk/pge/src/main/java/org/apache/oodt/cas/pge/config/PgeConfig.java 1302648
bq. trunk/pge/src/main/java/org/apache/oodt/cas/pge/config/PgeConfigBuilder.java 1302648
bq. trunk/pge/src/main/java/org/apache/oodt/cas/pge/config/PgeConfigMetKeys.java 1302648
bq. trunk/pge/src/main/java/org/apache/oodt/cas/pge/config/RegExprOutputFiles.java 1302648
bq. trunk/pge/src/main/java/org/apache/oodt/cas/pge/config/RenamingConv.java 1302648
bq. trunk/pge/src/main/java/org/apache/oodt/cas/pge/config/XmlFilePgeConfigBuilder.java 1302648
bq. trunk/pge/src/main/java/org/apache/oodt/cas/pge/metadata/PgeTaskMetKeys.java 1302648
bq. trunk/pge/src/main/java/org/apache/oodt/cas/pge/writers/ExternExtractorMetWriter.java 1302648
bq. trunk/pge/src/main/java/org/apache/oodt/cas/pge/writers/FilenameExtractorWriter.java 1302648
bq. trunk/pge/src/main/java/org/apache/oodt/cas/pge/writers/PcsMetFileWriter.java 1302648
bq. trunk/pge/src/main/java/org/apache/oodt/cas/pge/writers/SciPgeConfigFileWriter.java 1302648
bq. trunk/pge/src/main/java/org/apache/oodt/cas/pge/writers/metlist/MetadataListPcsMetFileWriter.java 1302648
bq. trunk/pge/src/main/java/org/apache/oodt/cas/pge/writers/xslt/XslTransformWriter.java 1302648
bq. trunk/pge/src/main/resources/examples/Crawler/action-beans.xml PRE-CREATION
bq. trunk/pge/src/main/resources/examples/Crawler/crawler-config.xml PRE-CREATION
bq. trunk/pge/src/main/resources/examples/Crawler/mime-extractor-map.xml PRE-CREATION
bq. trunk/pge/src/main/resources/examples/Crawler/mime-types.xml PRE-CREATION
bq. trunk/pge/src/main/resources/examples/Crawler/naming-beans.xml PRE-CREATION
bq. trunk/pge/src/main/resources/examples/Crawler/precondition-beans.xml PRE-CREATION
bq. trunk/pge/src/main/resources/examples/MetadataOutputFiles/metadata-output.xml 1302648
bq. trunk/pge/src/main/resources/examples/PgeConfigFiles/pge-config.xml 1302648
bq. trunk/pge/src/test/org/apache/oodt/cas/pge/TestPGETaskInstance.java 1302781
bq.
bq. Diff: https://reviews.apache.org/r/4628/diff
bq.
bq.
bq. Testing
bq. -------
bq.
bq. Several Unit-tests
bq.
bq.
bq. Thanks,
bq.
bq. brian
bq.
bq.
> Introduce a CAS-Metadata based renaming interface
> -------------------------------------------------
>
> Key: OODT-426
> URL: https://issues.apache.org/jira/browse/OODT-426
> Project: OODT
> Issue Type: Sub-task
> Components: crawler, metadata container, pge wrapper framework
> Affects Versions: 0.3
> Environment: none
> Reporter: Brian Foster
> Assignee: Brian Foster
> Priority: Minor
> Fix For: 0.5
>
> Attachments: OODT-426.2012-03-20.cas-crawler.patch.txt, OODT-426.2012-03-20.cas-metadata.patch.txt, OODT-426.2012-03-24.cas-crawler.patch.txt, OODT-426.2012-04-03.cas-pge.txt
>
>
> The idea here is that CAS-Metadata will introduce a new NamingConvention interface, which will allow for renaming of files. CAS-Crawler will then be modified to support specified NamingConventions which will be run after all preconditions have passed for a given file. This will then allow CAS-PGE to then use AutoDetectProductCrawler instead of StdProductCrawler, which will standardize across the board for file extraction (currently CAS-PGE has it's own file extraction interface which uses regular expression to determine files which should be extracted and ingested). The only missing feature in CAS-Crawler which CAS-PGE supports is file renaming, which this new NamingConvention interface will introduce. Here is what the NamingConvention interface will look like:
> {code}
> public interface NamingConvention {
> public File rename(File file, Metadata metadata)
> throws NamingConventionException;
> }
> {code}
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (OODT-426) Introduce a CAS-Metadata based
renaming interface
Posted by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/OODT-426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13247971#comment-13247971 ]
jiraposter@reviews.apache.org commented on OODT-426:
----------------------------------------------------
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/4628/
-----------------------------------------------------------
(Updated 2012-04-06 02:16:10.469275)
Review request for oodt, Chris Mattmann, Ricky Nguyen, Paul Ramirez, and Thomas Bennett.
Changes
-------
Updates per comments in reviews
Summary
-------
CAS-PGE Changes to this issue...
- Renaming and Metadata extraction removed from CAS-PGE and instead CAS-PGE now uses AutoDetectProductCrawler instead of StdProductCrawler
This addresses bug OODT-426.
https://issues.apache.org/jira/browse/OODT-426
Diffs (updated)
-----
trunk/pge/src/main/resources/examples/Crawler/naming-beans.xml PRE-CREATION
trunk/pge/src/main/resources/examples/Crawler/precondition-beans.xml PRE-CREATION
trunk/pge/src/main/resources/examples/MetadataOutputFiles/metadata-output.xml 1302648
trunk/pge/src/main/resources/examples/PgeConfigFiles/pge-config.xml 1302648
trunk/pge/src/test/org/apache/oodt/cas/pge/TestPGETaskInstance.java 1302781
trunk/pge/src/main/resources/examples/Crawler/mime-types.xml PRE-CREATION
trunk/pge/src/main/resources/examples/Crawler/mime-extractor-map.xml PRE-CREATION
trunk/pge/src/main/resources/examples/Crawler/crawler-config.xml PRE-CREATION
trunk/pge/src/main/resources/examples/Crawler/filename.extractor.config.xml PRE-CREATION
trunk/pge/src/main/resources/examples/Crawler/action-beans.xml PRE-CREATION
trunk/pge/src/main/resources/examples/Crawler/action-beans.properties PRE-CREATION
trunk/pge/src/main/java/org/apache/oodt/cas/pge/writers/metlist/MetadataListPcsMetFileWriter.java 1302648
trunk/pge/src/main/java/org/apache/oodt/cas/pge/writers/xslt/XslTransformWriter.java 1302648
trunk/pge/src/main/java/org/apache/oodt/cas/pge/writers/SciPgeConfigFileWriter.java 1302648
trunk/pge/src/main/java/org/apache/oodt/cas/pge/config/XmlFilePgeConfigBuilder.java 1302648
trunk/pge/src/main/java/org/apache/oodt/cas/pge/metadata/PgeTaskMetKeys.java 1302648
trunk/pge/src/main/java/org/apache/oodt/cas/pge/writers/ExternExtractorMetWriter.java 1302648
trunk/pge/src/main/java/org/apache/oodt/cas/pge/writers/FilenameExtractorWriter.java 1302648
trunk/pge/src/main/java/org/apache/oodt/cas/pge/writers/PcsMetFileWriter.java 1302648
trunk/pge/src/main/java/org/apache/oodt/cas/pge/config/RenamingConv.java 1302648
trunk/pge/src/main/java/org/apache/oodt/cas/pge/config/PgeConfigBuilder.java 1302648
trunk/pge/src/main/java/org/apache/oodt/cas/pge/config/PgeConfigMetKeys.java 1302648
trunk/pge/src/main/java/org/apache/oodt/cas/pge/config/RegExprOutputFiles.java 1302648
trunk/pge/src/main/java/org/apache/oodt/cas/pge/config/OutputDir.java 1302648
trunk/pge/src/main/java/org/apache/oodt/cas/pge/config/PgeConfig.java 1302648
trunk/pge/pom.xml 1302648
trunk/pge/src/main/java/org/apache/oodt/cas/pge/PGETaskInstance.java 1302648
Diff: https://reviews.apache.org/r/4628/diff
Testing
-------
Several Unit-tests
Thanks,
brian
> Introduce a CAS-Metadata based renaming interface
> -------------------------------------------------
>
> Key: OODT-426
> URL: https://issues.apache.org/jira/browse/OODT-426
> Project: OODT
> Issue Type: Sub-task
> Components: crawler, metadata container, pge wrapper framework
> Affects Versions: 0.3
> Environment: none
> Reporter: Brian Foster
> Assignee: Brian Foster
> Priority: Minor
> Fix For: 0.5
>
> Attachments: OODT-426.2012-03-20.cas-crawler.patch.txt, OODT-426.2012-03-20.cas-metadata.patch.txt, OODT-426.2012-03-24.cas-crawler.patch.txt, OODT-426.2012-04-03.cas-pge.txt
>
>
> The idea here is that CAS-Metadata will introduce a new NamingConvention interface, which will allow for renaming of files. CAS-Crawler will then be modified to support specified NamingConventions which will be run after all preconditions have passed for a given file. This will then allow CAS-PGE to then use AutoDetectProductCrawler instead of StdProductCrawler, which will standardize across the board for file extraction (currently CAS-PGE has it's own file extraction interface which uses regular expression to determine files which should be extracted and ingested). The only missing feature in CAS-Crawler which CAS-PGE supports is file renaming, which this new NamingConvention interface will introduce. Here is what the NamingConvention interface will look like:
> {code}
> public interface NamingConvention {
> public File rename(File file, Metadata metadata)
> throws NamingConventionException;
> }
> {code}
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (OODT-426) Introduce a CAS-Metadata based
renaming interface
Posted by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/OODT-426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13238787#comment-13238787 ]
jiraposter@reviews.apache.org commented on OODT-426:
----------------------------------------------------
bq. On 2012-03-26 15:50:17, Chris Mattmann wrote:
bq. > trunk/crawler/src/main/java/org/apache/oodt/cas/crawl/ProductCrawler.java, line 326
bq. > <https://reviews.apache.org/r/4444/diff/2/?file=95394#file95394line326>
bq. >
bq. > Are all of these @VisibleForTesting coupling our test system too much with the code? Just wondering...
I just changing the visibility on these methods from private to package level... then i notated them with the @VisibleForTesting to then make this clear to other developers that the only reason for these methods being package level is to make them unit-test-able
bq. On 2012-03-26 15:50:17, Chris Mattmann wrote:
bq. > trunk/crawler/src/main/java/org/apache/oodt/cas/crawl/StdProductCrawler.java, line 71
bq. > <https://reviews.apache.org/r/4444/diff/2/?file=95395#file95395line71>
bq. >
bq. > Should we augment the ProductCrawler super class to declare this function as an abstract method since all sub class crawlers implement it?
it is... this method does need a @Override above it to make it clear... i'll add this
- brian
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/4444/#review6349
-----------------------------------------------------------
On 2012-03-25 01:55:32, brian Foster wrote:
bq.
bq. -----------------------------------------------------------
bq. This is an automatically generated e-mail. To reply, visit:
bq. https://reviews.apache.org/r/4444/
bq. -----------------------------------------------------------
bq.
bq. (Updated 2012-03-25 01:55:32)
bq.
bq.
bq. Review request for oodt, Chris Mattmann, Ricky Nguyen, Paul Ramirez, and Thomas Bennett.
bq.
bq.
bq. Summary
bq. -------
bq.
bq. - Introduced NamingConvention support, which for MetExtractorProductCrawler now takes an ID for the NamingConvention to use and AutoDetectProductCrawler has a new element <namingConvention class=""/> in MimeExtractorRepo
bq. - Also cleaned up handleFile(File)... documented better and is now public and returns the IngestResult for what happened when called.
bq.
bq.
bq. This addresses bug OODT-426.
bq. https://issues.apache.org/jira/browse/OODT-426
bq.
bq.
bq. Diffs
bq. -----
bq.
bq. trunk/crawler/src/main/resources/naming-beans.xml PRE-CREATION
bq. trunk/crawler/src/test/org/apache/oodt/cas/crawl/TestProductCrawler.java PRE-CREATION
bq. trunk/crawler/src/main/resources/cmd-line-options.xml 1302790
bq. trunk/crawler/src/main/resources/crawler-config.xml 1302790
bq. trunk/crawler/src/main/resources/examples/mime-extractor-map.xml 1302790
bq. trunk/crawler/src/main/java/org/apache/oodt/cas/crawl/typedetection/MimeExtractorRepo.java 1302790
bq. trunk/crawler/src/main/java/org/apache/oodt/cas/crawl/StdProductCrawler.java 1302790
bq. trunk/crawler/src/main/java/org/apache/oodt/cas/crawl/typedetection/MimeExtractorConfigMetKeys.java 1302790
bq. trunk/crawler/src/main/java/org/apache/oodt/cas/crawl/typedetection/MimeExtractorConfigReader.java 1302790
bq. trunk/crawler/src/main/java/org/apache/oodt/cas/crawl/ProductCrawler.java 1302790
bq. trunk/crawler/pom.xml 1302790
bq. trunk/crawler/src/main/java/org/apache/oodt/cas/crawl/AutoDetectProductCrawler.java 1302790
bq. trunk/crawler/src/main/java/org/apache/oodt/cas/crawl/MetExtractorProductCrawler.java 1302790
bq.
bq. Diff: https://reviews.apache.org/r/4444/diff
bq.
bq.
bq. Testing
bq. -------
bq.
bq. Still need to unit-test up cas-crawler
bq.
bq.
bq. Thanks,
bq.
bq. brian
bq.
bq.
> Introduce a CAS-Metadata based renaming interface
> -------------------------------------------------
>
> Key: OODT-426
> URL: https://issues.apache.org/jira/browse/OODT-426
> Project: OODT
> Issue Type: Sub-task
> Components: crawler, metadata container, pge wrapper framework
> Affects Versions: 0.3
> Environment: none
> Reporter: Brian Foster
> Assignee: Brian Foster
> Priority: Minor
> Fix For: 0.4
>
> Attachments: OODT-426.2012-03-20.cas-crawler.patch.txt, OODT-426.2012-03-20.cas-metadata.patch.txt, OODT-426.2012-03-24.cas-crawler.patch.txt
>
>
> The idea here is that CAS-Metadata will introduce a new NamingConvention interface, which will allow for renaming of files. CAS-Crawler will then be modified to support specified NamingConventions which will be run after all preconditions have passed for a given file. This will then allow CAS-PGE to then use AutoDetectProductCrawler instead of StdProductCrawler, which will standardize across the board for file extraction (currently CAS-PGE has it's own file extraction interface which uses regular expression to determine files which should be extracted and ingested). The only missing feature in CAS-Crawler which CAS-PGE supports is file renaming, which this new NamingConvention interface will introduce. Here is what the NamingConvention interface will look like:
> {code}
> public interface NamingConvention {
> public File rename(File file, Metadata metadata)
> throws NamingConventionException;
> }
> {code}
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (OODT-426) Introduce a CAS-Metadata based
renaming interface
Posted by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/OODT-426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13239086#comment-13239086 ]
jiraposter@reviews.apache.org commented on OODT-426:
----------------------------------------------------
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/4444/#review6387
-----------------------------------------------------------
trunk/crawler/src/main/java/org/apache/oodt/cas/crawl/typedetection/MetExtractorSpec.java
<https://reviews.apache.org/r/4444/#comment13930>
ignore this files changes... i removed them locally... was added these while debugging
- brian
On 2012-03-27 00:47:30, brian Foster wrote:
bq.
bq. -----------------------------------------------------------
bq. This is an automatically generated e-mail. To reply, visit:
bq. https://reviews.apache.org/r/4444/
bq. -----------------------------------------------------------
bq.
bq. (Updated 2012-03-27 00:47:30)
bq.
bq.
bq. Review request for oodt, Chris Mattmann, Ricky Nguyen, Paul Ramirez, and Thomas Bennett.
bq.
bq.
bq. Summary
bq. -------
bq.
bq. - Introduced NamingConvention support, which for MetExtractorProductCrawler now takes an ID for the NamingConvention to use and AutoDetectProductCrawler has a new element <namingConvention class=""/> in MimeExtractorRepo
bq. - Also cleaned up handleFile(File)... documented better and is now public and returns the IngestResult for what happened when called.
bq.
bq.
bq. This addresses bug OODT-426.
bq. https://issues.apache.org/jira/browse/OODT-426
bq.
bq.
bq. Diffs
bq. -----
bq.
bq. trunk/crawler/src/main/java/org/apache/oodt/cas/crawl/ProductCrawler.java 1305657
bq. trunk/crawler/src/main/java/org/apache/oodt/cas/crawl/MetExtractorProductCrawler.java 1305657
bq. trunk/crawler/pom.xml 1305657
bq. trunk/crawler/src/main/java/org/apache/oodt/cas/crawl/AutoDetectProductCrawler.java 1305657
bq. trunk/crawler/src/main/java/org/apache/oodt/cas/crawl/StdProductCrawler.java 1305657
bq. trunk/crawler/src/main/java/org/apache/oodt/cas/crawl/typedetection/MetExtractorSpec.java 1305657
bq. trunk/crawler/src/main/java/org/apache/oodt/cas/crawl/typedetection/MimeExtractorConfigMetKeys.java 1305657
bq. trunk/crawler/src/main/java/org/apache/oodt/cas/crawl/typedetection/MimeExtractorConfigReader.java 1305657
bq. trunk/crawler/src/main/java/org/apache/oodt/cas/crawl/typedetection/MimeExtractorRepo.java 1305657
bq. trunk/crawler/src/main/resources/cmd-line-options.xml 1305657
bq. trunk/crawler/src/main/resources/crawler-config.xml 1305657
bq. trunk/crawler/src/main/resources/examples/mime-extractor-map.xml 1305657
bq. trunk/crawler/src/main/resources/naming-beans.xml PRE-CREATION
bq. trunk/crawler/src/test/org/apache/oodt/cas/crawl/StateAwareProductCrawler.java PRE-CREATION
bq. trunk/crawler/src/test/org/apache/oodt/cas/crawl/TestProductCrawler.java PRE-CREATION
bq. trunk/crawler/src/test/org/apache/oodt/cas/crawl/typedetection/TestMimeExtractorConfigReader.java PRE-CREATION
bq.
bq. Diff: https://reviews.apache.org/r/4444/diff
bq.
bq.
bq. Testing
bq. -------
bq.
bq. Wrote several unit-tests
bq.
bq.
bq. Thanks,
bq.
bq. brian
bq.
bq.
> Introduce a CAS-Metadata based renaming interface
> -------------------------------------------------
>
> Key: OODT-426
> URL: https://issues.apache.org/jira/browse/OODT-426
> Project: OODT
> Issue Type: Sub-task
> Components: crawler, metadata container, pge wrapper framework
> Affects Versions: 0.3
> Environment: none
> Reporter: Brian Foster
> Assignee: Brian Foster
> Priority: Minor
> Fix For: 0.4
>
> Attachments: OODT-426.2012-03-20.cas-crawler.patch.txt, OODT-426.2012-03-20.cas-metadata.patch.txt, OODT-426.2012-03-24.cas-crawler.patch.txt
>
>
> The idea here is that CAS-Metadata will introduce a new NamingConvention interface, which will allow for renaming of files. CAS-Crawler will then be modified to support specified NamingConventions which will be run after all preconditions have passed for a given file. This will then allow CAS-PGE to then use AutoDetectProductCrawler instead of StdProductCrawler, which will standardize across the board for file extraction (currently CAS-PGE has it's own file extraction interface which uses regular expression to determine files which should be extracted and ingested). The only missing feature in CAS-Crawler which CAS-PGE supports is file renaming, which this new NamingConvention interface will introduce. Here is what the NamingConvention interface will look like:
> {code}
> public interface NamingConvention {
> public File rename(File file, Metadata metadata)
> throws NamingConventionException;
> }
> {code}
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (OODT-426) Introduce a CAS-Metadata based
renaming interface
Posted by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/OODT-426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13235417#comment-13235417 ]
jiraposter@reviews.apache.org commented on OODT-426:
----------------------------------------------------
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/4444/#review6214
-----------------------------------------------------------
trunk/crawler/src/main/java/org/apache/oodt/cas/crawl/MetExtractorProductCrawler.java
<https://reviews.apache.org/r/4444/#comment13396>
will this cause back incompat changes for peeps with default crawl scripts that use MetExtractorProductCrawler? If so, can we default it to something to appease that?
trunk/crawler/src/main/java/org/apache/oodt/cas/crawl/typedetection/MimeExtractorConfigReader.java
<https://reviews.apache.org/r/4444/#comment13397>
good job adding this!
trunk/crawler/src/main/java/org/apache/oodt/cas/crawl/typedetection/MimeExtractorConfigReader.java
<https://reviews.apache.org/r/4444/#comment13398>
does this make it so that Exception is never thrown and that this the default?
trunk/crawler/src/main/java/org/apache/oodt/cas/crawl/typedetection/MimeExtractorRepo.java
<https://reviews.apache.org/r/4444/#comment13399>
same comment as above -- default naming convention ensures that exception is never thrown?
- Chris
On 2012-03-22 06:09:52, brian Foster wrote:
bq.
bq. -----------------------------------------------------------
bq. This is an automatically generated e-mail. To reply, visit:
bq. https://reviews.apache.org/r/4444/
bq. -----------------------------------------------------------
bq.
bq. (Updated 2012-03-22 06:09:52)
bq.
bq.
bq. Review request for oodt, Chris Mattmann, Ricky Nguyen, Paul Ramirez, and Thomas Bennett.
bq.
bq.
bq. Summary
bq. -------
bq.
bq. - Introduced NamingConvention support, which for MetExtractorProductCrawler now takes an ID for the NamingConvention to use and AutoDetectProductCrawler has a new element <namingConvention class=""/> in MimeExtractorRepo
bq. - Also cleaned up handleFile(File)... documented better and is now public and returns the IngestResult for what happened when called.
bq.
bq.
bq. This addresses bug OODT-426.
bq. https://issues.apache.org/jira/browse/OODT-426
bq.
bq.
bq. Diffs
bq. -----
bq.
bq. trunk/crawler/src/main/java/org/apache/oodt/cas/crawl/AutoDetectProductCrawler.java 1302790
bq. trunk/crawler/src/main/java/org/apache/oodt/cas/crawl/MetExtractorProductCrawler.java 1302790
bq. trunk/crawler/src/main/java/org/apache/oodt/cas/crawl/ProductCrawler.java 1302790
bq. trunk/crawler/src/main/java/org/apache/oodt/cas/crawl/StdProductCrawler.java 1302790
bq. trunk/crawler/src/main/java/org/apache/oodt/cas/crawl/typedetection/MimeExtractorConfigMetKeys.java 1302790
bq. trunk/crawler/src/main/java/org/apache/oodt/cas/crawl/typedetection/MimeExtractorConfigReader.java 1302790
bq. trunk/crawler/src/main/java/org/apache/oodt/cas/crawl/typedetection/MimeExtractorRepo.java 1302790
bq. trunk/crawler/src/main/resources/cmd-line-options.xml 1302790
bq. trunk/crawler/src/main/resources/crawler-config.xml 1302790
bq. trunk/crawler/src/main/resources/examples/mime-extractor-map.xml 1302790
bq. trunk/crawler/src/main/resources/naming-beans.xml PRE-CREATION
bq.
bq. Diff: https://reviews.apache.org/r/4444/diff
bq.
bq.
bq. Testing
bq. -------
bq.
bq. Still need to unit-test up cas-crawler
bq.
bq.
bq. Thanks,
bq.
bq. brian
bq.
bq.
> Introduce a CAS-Metadata based renaming interface
> -------------------------------------------------
>
> Key: OODT-426
> URL: https://issues.apache.org/jira/browse/OODT-426
> Project: OODT
> Issue Type: Sub-task
> Components: crawler, metadata container, pge wrapper framework
> Affects Versions: 0.3
> Environment: none
> Reporter: Brian Foster
> Assignee: Brian Foster
> Priority: Minor
> Fix For: 0.4
>
> Attachments: OODT-426.2012-03-20.cas-crawler.patch.txt, OODT-426.2012-03-20.cas-metadata.patch.txt
>
>
> The idea here is that CAS-Metadata will introduce a new NamingConvention interface, which will allow for renaming of files. CAS-Crawler will then be modified to support specified NamingConventions which will be run after all preconditions have passed for a given file. This will then allow CAS-PGE to then use AutoDetectProductCrawler instead of StdProductCrawler, which will standardize across the board for file extraction (currently CAS-PGE has it's own file extraction interface which uses regular expression to determine files which should be extracted and ingested). The only missing feature in CAS-Crawler which CAS-PGE supports is file renaming, which this new NamingConvention interface will introduce. Here is what the NamingConvention interface will look like:
> {code}
> public interface NamingConvention {
> public File rename(File file, Metadata metadata)
> throws NamingConventionException;
> }
> {code}
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (OODT-426) Introduce a CAS-Metadata based
renaming interface
Posted by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/OODT-426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13235022#comment-13235022 ]
jiraposter@reviews.apache.org commented on OODT-426:
----------------------------------------------------
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/4412/#review6185
-----------------------------------------------------------
Ship it!
LGTM
- Chris
On 2012-03-20 08:06:42, brian Foster wrote:
bq.
bq. -----------------------------------------------------------
bq. This is an automatically generated e-mail. To reply, visit:
bq. https://reviews.apache.org/r/4412/
bq. -----------------------------------------------------------
bq.
bq. (Updated 2012-03-20 08:06:42)
bq.
bq.
bq. Review request for oodt, Chris Mattmann, Ricky Nguyen, Paul Ramirez, and Thomas Bennett.
bq.
bq.
bq. Summary
bq. -------
bq.
bq. This is the CAS-Metadata part of this issue
bq.
bq.
bq. This addresses bug OODT-426.
bq. https://issues.apache.org/jira/browse/OODT-426
bq.
bq.
bq. Diffs
bq. -----
bq.
bq. trunk/metadata/src/main/java/org/apache/oodt/cas/metadata/exceptions/NamingConventionException.java PRE-CREATION
bq. trunk/metadata/src/main/java/org/apache/oodt/cas/metadata/filenaming/NamingConvention.java PRE-CREATION
bq. trunk/metadata/src/main/java/org/apache/oodt/cas/metadata/filenaming/PathUtilsNamingConvention.java PRE-CREATION
bq.
bq. Diff: https://reviews.apache.org/r/4412/diff
bq.
bq.
bq. Testing
bq. -------
bq.
bq. Still need to add unit-test for PathUtilsNamingConvention
bq.
bq.
bq. Thanks,
bq.
bq. brian
bq.
bq.
> Introduce a CAS-Metadata based renaming interface
> -------------------------------------------------
>
> Key: OODT-426
> URL: https://issues.apache.org/jira/browse/OODT-426
> Project: OODT
> Issue Type: Sub-task
> Components: crawler, metadata container, pge wrapper framework
> Affects Versions: 0.3
> Environment: none
> Reporter: Brian Foster
> Assignee: Brian Foster
> Priority: Minor
> Fix For: 0.4
>
> Attachments: OODT-426.2012-03-20.cas-metadata.patch.txt
>
>
> The idea here is that CAS-Metadata will introduce a new NamingConvention interface, which will allow for renaming of files. CAS-Crawler will then be modified to support specified NamingConventions which will be run after all preconditions have passed for a given file. This will then allow CAS-PGE to then use AutoDetectProductCrawler instead of StdProductCrawler, which will standardize across the board for file extraction (currently CAS-PGE has it's own file extraction interface which uses regular expression to determine files which should be extracted and ingested). The only missing feature in CAS-Crawler which CAS-PGE supports is file renaming, which this new NamingConvention interface will introduce. Here is what the NamingConvention interface will look like:
> {code}
> public interface NamingConvention {
> public File rename(File file, Metadata metadata)
> throws NamingConventionException;
> }
> {code}
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (OODT-426) Introduce a CAS-Metadata based
renaming interface
Posted by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/OODT-426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13239083#comment-13239083 ]
jiraposter@reviews.apache.org commented on OODT-426:
----------------------------------------------------
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/4444/
-----------------------------------------------------------
(Updated 2012-03-27 00:47:30.189828)
Review request for oodt, Chris Mattmann, Ricky Nguyen, Paul Ramirez, and Thomas Bennett.
Changes
-------
- Updated unit-tests -- ProductCrawler has a unit-test for each possible path that can be taken through handleFile(File) and verifies the appropriate methods where called
Summary
-------
- Introduced NamingConvention support, which for MetExtractorProductCrawler now takes an ID for the NamingConvention to use and AutoDetectProductCrawler has a new element <namingConvention class=""/> in MimeExtractorRepo
- Also cleaned up handleFile(File)... documented better and is now public and returns the IngestResult for what happened when called.
This addresses bug OODT-426.
https://issues.apache.org/jira/browse/OODT-426
Diffs (updated)
-----
trunk/crawler/src/main/java/org/apache/oodt/cas/crawl/ProductCrawler.java 1305657
trunk/crawler/src/main/java/org/apache/oodt/cas/crawl/MetExtractorProductCrawler.java 1305657
trunk/crawler/pom.xml 1305657
trunk/crawler/src/main/java/org/apache/oodt/cas/crawl/AutoDetectProductCrawler.java 1305657
trunk/crawler/src/main/java/org/apache/oodt/cas/crawl/StdProductCrawler.java 1305657
trunk/crawler/src/main/java/org/apache/oodt/cas/crawl/typedetection/MetExtractorSpec.java 1305657
trunk/crawler/src/main/java/org/apache/oodt/cas/crawl/typedetection/MimeExtractorConfigMetKeys.java 1305657
trunk/crawler/src/main/java/org/apache/oodt/cas/crawl/typedetection/MimeExtractorConfigReader.java 1305657
trunk/crawler/src/main/java/org/apache/oodt/cas/crawl/typedetection/MimeExtractorRepo.java 1305657
trunk/crawler/src/main/resources/cmd-line-options.xml 1305657
trunk/crawler/src/main/resources/crawler-config.xml 1305657
trunk/crawler/src/main/resources/examples/mime-extractor-map.xml 1305657
trunk/crawler/src/main/resources/naming-beans.xml PRE-CREATION
trunk/crawler/src/test/org/apache/oodt/cas/crawl/StateAwareProductCrawler.java PRE-CREATION
trunk/crawler/src/test/org/apache/oodt/cas/crawl/TestProductCrawler.java PRE-CREATION
trunk/crawler/src/test/org/apache/oodt/cas/crawl/typedetection/TestMimeExtractorConfigReader.java PRE-CREATION
Diff: https://reviews.apache.org/r/4444/diff
Testing (updated)
-------
Wrote several unit-tests
Thanks,
brian
> Introduce a CAS-Metadata based renaming interface
> -------------------------------------------------
>
> Key: OODT-426
> URL: https://issues.apache.org/jira/browse/OODT-426
> Project: OODT
> Issue Type: Sub-task
> Components: crawler, metadata container, pge wrapper framework
> Affects Versions: 0.3
> Environment: none
> Reporter: Brian Foster
> Assignee: Brian Foster
> Priority: Minor
> Fix For: 0.4
>
> Attachments: OODT-426.2012-03-20.cas-crawler.patch.txt, OODT-426.2012-03-20.cas-metadata.patch.txt, OODT-426.2012-03-24.cas-crawler.patch.txt
>
>
> The idea here is that CAS-Metadata will introduce a new NamingConvention interface, which will allow for renaming of files. CAS-Crawler will then be modified to support specified NamingConventions which will be run after all preconditions have passed for a given file. This will then allow CAS-PGE to then use AutoDetectProductCrawler instead of StdProductCrawler, which will standardize across the board for file extraction (currently CAS-PGE has it's own file extraction interface which uses regular expression to determine files which should be extracted and ingested). The only missing feature in CAS-Crawler which CAS-PGE supports is file renaming, which this new NamingConvention interface will introduce. Here is what the NamingConvention interface will look like:
> {code}
> public interface NamingConvention {
> public File rename(File file, Metadata metadata)
> throws NamingConventionException;
> }
> {code}
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (OODT-426) Introduce a CAS-Metadata based
renaming interface
Posted by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/OODT-426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13248127#comment-13248127 ]
jiraposter@reviews.apache.org commented on OODT-426:
----------------------------------------------------
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/4628/#review6735
-----------------------------------------------------------
Ship it!
LGTM sounds good.
- Chris
On 2012-04-06 02:16:10, brian Foster wrote:
bq.
bq. -----------------------------------------------------------
bq. This is an automatically generated e-mail. To reply, visit:
bq. https://reviews.apache.org/r/4628/
bq. -----------------------------------------------------------
bq.
bq. (Updated 2012-04-06 02:16:10)
bq.
bq.
bq. Review request for oodt, Chris Mattmann, Ricky Nguyen, Paul Ramirez, and Thomas Bennett.
bq.
bq.
bq. Summary
bq. -------
bq.
bq. CAS-PGE Changes to this issue...
bq. - Renaming and Metadata extraction removed from CAS-PGE and instead CAS-PGE now uses AutoDetectProductCrawler instead of StdProductCrawler
bq.
bq.
bq. This addresses bug OODT-426.
bq. https://issues.apache.org/jira/browse/OODT-426
bq.
bq.
bq. Diffs
bq. -----
bq.
bq. trunk/pge/src/main/resources/examples/Crawler/naming-beans.xml PRE-CREATION
bq. trunk/pge/src/main/resources/examples/Crawler/precondition-beans.xml PRE-CREATION
bq. trunk/pge/src/main/resources/examples/MetadataOutputFiles/metadata-output.xml 1302648
bq. trunk/pge/src/main/resources/examples/PgeConfigFiles/pge-config.xml 1302648
bq. trunk/pge/src/test/org/apache/oodt/cas/pge/TestPGETaskInstance.java 1302781
bq. trunk/pge/src/main/resources/examples/Crawler/mime-types.xml PRE-CREATION
bq. trunk/pge/src/main/resources/examples/Crawler/mime-extractor-map.xml PRE-CREATION
bq. trunk/pge/src/main/resources/examples/Crawler/crawler-config.xml PRE-CREATION
bq. trunk/pge/src/main/resources/examples/Crawler/filename.extractor.config.xml PRE-CREATION
bq. trunk/pge/src/main/resources/examples/Crawler/action-beans.xml PRE-CREATION
bq. trunk/pge/src/main/resources/examples/Crawler/action-beans.properties PRE-CREATION
bq. trunk/pge/src/main/java/org/apache/oodt/cas/pge/writers/metlist/MetadataListPcsMetFileWriter.java 1302648
bq. trunk/pge/src/main/java/org/apache/oodt/cas/pge/writers/xslt/XslTransformWriter.java 1302648
bq. trunk/pge/src/main/java/org/apache/oodt/cas/pge/writers/SciPgeConfigFileWriter.java 1302648
bq. trunk/pge/src/main/java/org/apache/oodt/cas/pge/config/XmlFilePgeConfigBuilder.java 1302648
bq. trunk/pge/src/main/java/org/apache/oodt/cas/pge/metadata/PgeTaskMetKeys.java 1302648
bq. trunk/pge/src/main/java/org/apache/oodt/cas/pge/writers/ExternExtractorMetWriter.java 1302648
bq. trunk/pge/src/main/java/org/apache/oodt/cas/pge/writers/FilenameExtractorWriter.java 1302648
bq. trunk/pge/src/main/java/org/apache/oodt/cas/pge/writers/PcsMetFileWriter.java 1302648
bq. trunk/pge/src/main/java/org/apache/oodt/cas/pge/config/RenamingConv.java 1302648
bq. trunk/pge/src/main/java/org/apache/oodt/cas/pge/config/PgeConfigBuilder.java 1302648
bq. trunk/pge/src/main/java/org/apache/oodt/cas/pge/config/PgeConfigMetKeys.java 1302648
bq. trunk/pge/src/main/java/org/apache/oodt/cas/pge/config/RegExprOutputFiles.java 1302648
bq. trunk/pge/src/main/java/org/apache/oodt/cas/pge/config/OutputDir.java 1302648
bq. trunk/pge/src/main/java/org/apache/oodt/cas/pge/config/PgeConfig.java 1302648
bq. trunk/pge/pom.xml 1302648
bq. trunk/pge/src/main/java/org/apache/oodt/cas/pge/PGETaskInstance.java 1302648
bq.
bq. Diff: https://reviews.apache.org/r/4628/diff
bq.
bq.
bq. Testing
bq. -------
bq.
bq. Several Unit-tests
bq.
bq.
bq. Thanks,
bq.
bq. brian
bq.
bq.
> Introduce a CAS-Metadata based renaming interface
> -------------------------------------------------
>
> Key: OODT-426
> URL: https://issues.apache.org/jira/browse/OODT-426
> Project: OODT
> Issue Type: Sub-task
> Components: crawler, metadata container, pge wrapper framework
> Affects Versions: 0.3
> Environment: none
> Reporter: Brian Foster
> Assignee: Brian Foster
> Priority: Minor
> Fix For: 0.5
>
> Attachments: OODT-426.2012-03-20.cas-crawler.patch.txt, OODT-426.2012-03-20.cas-metadata.patch.txt, OODT-426.2012-03-24.cas-crawler.patch.txt, OODT-426.2012-04-03.cas-pge.txt
>
>
> The idea here is that CAS-Metadata will introduce a new NamingConvention interface, which will allow for renaming of files. CAS-Crawler will then be modified to support specified NamingConventions which will be run after all preconditions have passed for a given file. This will then allow CAS-PGE to then use AutoDetectProductCrawler instead of StdProductCrawler, which will standardize across the board for file extraction (currently CAS-PGE has it's own file extraction interface which uses regular expression to determine files which should be extracted and ingested). The only missing feature in CAS-Crawler which CAS-PGE supports is file renaming, which this new NamingConvention interface will introduce. Here is what the NamingConvention interface will look like:
> {code}
> public interface NamingConvention {
> public File rename(File file, Metadata metadata)
> throws NamingConventionException;
> }
> {code}
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (OODT-426) Introduce a CAS-Metadata based
renaming interface
Posted by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/OODT-426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13237761#comment-13237761 ]
jiraposter@reviews.apache.org commented on OODT-426:
----------------------------------------------------
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/4444/
-----------------------------------------------------------
(Updated 2012-03-25 01:55:32.563950)
Review request for oodt, Chris Mattmann, Ricky Nguyen, Paul Ramirez, and Thomas Bennett.
Changes
-------
with some unit-tests... few more to go
Summary
-------
- Introduced NamingConvention support, which for MetExtractorProductCrawler now takes an ID for the NamingConvention to use and AutoDetectProductCrawler has a new element <namingConvention class=""/> in MimeExtractorRepo
- Also cleaned up handleFile(File)... documented better and is now public and returns the IngestResult for what happened when called.
This addresses bug OODT-426.
https://issues.apache.org/jira/browse/OODT-426
Diffs (updated)
-----
trunk/crawler/src/main/resources/naming-beans.xml PRE-CREATION
trunk/crawler/src/test/org/apache/oodt/cas/crawl/TestProductCrawler.java PRE-CREATION
trunk/crawler/src/main/resources/cmd-line-options.xml 1302790
trunk/crawler/src/main/resources/crawler-config.xml 1302790
trunk/crawler/src/main/resources/examples/mime-extractor-map.xml 1302790
trunk/crawler/src/main/java/org/apache/oodt/cas/crawl/typedetection/MimeExtractorRepo.java 1302790
trunk/crawler/src/main/java/org/apache/oodt/cas/crawl/StdProductCrawler.java 1302790
trunk/crawler/src/main/java/org/apache/oodt/cas/crawl/typedetection/MimeExtractorConfigMetKeys.java 1302790
trunk/crawler/src/main/java/org/apache/oodt/cas/crawl/typedetection/MimeExtractorConfigReader.java 1302790
trunk/crawler/src/main/java/org/apache/oodt/cas/crawl/ProductCrawler.java 1302790
trunk/crawler/pom.xml 1302790
trunk/crawler/src/main/java/org/apache/oodt/cas/crawl/AutoDetectProductCrawler.java 1302790
trunk/crawler/src/main/java/org/apache/oodt/cas/crawl/MetExtractorProductCrawler.java 1302790
Diff: https://reviews.apache.org/r/4444/diff
Testing
-------
Still need to unit-test up cas-crawler
Thanks,
brian
> Introduce a CAS-Metadata based renaming interface
> -------------------------------------------------
>
> Key: OODT-426
> URL: https://issues.apache.org/jira/browse/OODT-426
> Project: OODT
> Issue Type: Sub-task
> Components: crawler, metadata container, pge wrapper framework
> Affects Versions: 0.3
> Environment: none
> Reporter: Brian Foster
> Assignee: Brian Foster
> Priority: Minor
> Fix For: 0.4
>
> Attachments: OODT-426.2012-03-20.cas-crawler.patch.txt, OODT-426.2012-03-20.cas-metadata.patch.txt, OODT-426.2012-03-24.cas-crawler.patch.txt
>
>
> The idea here is that CAS-Metadata will introduce a new NamingConvention interface, which will allow for renaming of files. CAS-Crawler will then be modified to support specified NamingConventions which will be run after all preconditions have passed for a given file. This will then allow CAS-PGE to then use AutoDetectProductCrawler instead of StdProductCrawler, which will standardize across the board for file extraction (currently CAS-PGE has it's own file extraction interface which uses regular expression to determine files which should be extracted and ingested). The only missing feature in CAS-Crawler which CAS-PGE supports is file renaming, which this new NamingConvention interface will introduce. Here is what the NamingConvention interface will look like:
> {code}
> public interface NamingConvention {
> public File rename(File file, Metadata metadata)
> throws NamingConventionException;
> }
> {code}
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (OODT-426) Introduce a CAS-Metadata based
renaming interface
Posted by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/OODT-426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13246566#comment-13246566 ]
jiraposter@reviews.apache.org commented on OODT-426:
----------------------------------------------------
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/4628/#review6694
-----------------------------------------------------------
Ship it!
LGTM, minor comments on my end. Great work. This will cause some user headache, but it's worth it and 0.4 is a game changing release.
- Chris
On 2012-04-03 21:56:17, brian Foster wrote:
bq.
bq. -----------------------------------------------------------
bq. This is an automatically generated e-mail. To reply, visit:
bq. https://reviews.apache.org/r/4628/
bq. -----------------------------------------------------------
bq.
bq. (Updated 2012-04-03 21:56:17)
bq.
bq.
bq. Review request for oodt, Chris Mattmann, Ricky Nguyen, Paul Ramirez, and Thomas Bennett.
bq.
bq.
bq. Summary
bq. -------
bq.
bq. CAS-PGE Changes to this issue...
bq. - Renaming and Metadata extraction removed from CAS-PGE and instead CAS-PGE now uses AutoDetectProductCrawler instead of StdProductCrawler
bq.
bq.
bq. This addresses bug OODT-426.
bq. https://issues.apache.org/jira/browse/OODT-426
bq.
bq.
bq. Diffs
bq. -----
bq.
bq. trunk/pge/pom.xml 1302648
bq. trunk/pge/src/main/java/org/apache/oodt/cas/pge/PGETaskInstance.java 1302648
bq. trunk/pge/src/main/java/org/apache/oodt/cas/pge/config/OutputDir.java 1302648
bq. trunk/pge/src/main/java/org/apache/oodt/cas/pge/config/PgeConfig.java 1302648
bq. trunk/pge/src/main/java/org/apache/oodt/cas/pge/config/PgeConfigBuilder.java 1302648
bq. trunk/pge/src/main/java/org/apache/oodt/cas/pge/config/PgeConfigMetKeys.java 1302648
bq. trunk/pge/src/main/java/org/apache/oodt/cas/pge/config/RegExprOutputFiles.java 1302648
bq. trunk/pge/src/main/java/org/apache/oodt/cas/pge/config/RenamingConv.java 1302648
bq. trunk/pge/src/main/java/org/apache/oodt/cas/pge/config/XmlFilePgeConfigBuilder.java 1302648
bq. trunk/pge/src/main/java/org/apache/oodt/cas/pge/metadata/PgeTaskMetKeys.java 1302648
bq. trunk/pge/src/main/java/org/apache/oodt/cas/pge/writers/ExternExtractorMetWriter.java 1302648
bq. trunk/pge/src/main/java/org/apache/oodt/cas/pge/writers/FilenameExtractorWriter.java 1302648
bq. trunk/pge/src/main/java/org/apache/oodt/cas/pge/writers/PcsMetFileWriter.java 1302648
bq. trunk/pge/src/main/java/org/apache/oodt/cas/pge/writers/SciPgeConfigFileWriter.java 1302648
bq. trunk/pge/src/main/java/org/apache/oodt/cas/pge/writers/metlist/MetadataListPcsMetFileWriter.java 1302648
bq. trunk/pge/src/main/java/org/apache/oodt/cas/pge/writers/xslt/XslTransformWriter.java 1302648
bq. trunk/pge/src/main/resources/examples/Crawler/action-beans.xml PRE-CREATION
bq. trunk/pge/src/main/resources/examples/Crawler/crawler-config.xml PRE-CREATION
bq. trunk/pge/src/main/resources/examples/Crawler/mime-extractor-map.xml PRE-CREATION
bq. trunk/pge/src/main/resources/examples/Crawler/mime-types.xml PRE-CREATION
bq. trunk/pge/src/main/resources/examples/Crawler/naming-beans.xml PRE-CREATION
bq. trunk/pge/src/main/resources/examples/Crawler/precondition-beans.xml PRE-CREATION
bq. trunk/pge/src/main/resources/examples/MetadataOutputFiles/metadata-output.xml 1302648
bq. trunk/pge/src/main/resources/examples/PgeConfigFiles/pge-config.xml 1302648
bq. trunk/pge/src/test/org/apache/oodt/cas/pge/TestPGETaskInstance.java 1302781
bq.
bq. Diff: https://reviews.apache.org/r/4628/diff
bq.
bq.
bq. Testing
bq. -------
bq.
bq. Several Unit-tests
bq.
bq.
bq. Thanks,
bq.
bq. brian
bq.
bq.
> Introduce a CAS-Metadata based renaming interface
> -------------------------------------------------
>
> Key: OODT-426
> URL: https://issues.apache.org/jira/browse/OODT-426
> Project: OODT
> Issue Type: Sub-task
> Components: crawler, metadata container, pge wrapper framework
> Affects Versions: 0.3
> Environment: none
> Reporter: Brian Foster
> Assignee: Brian Foster
> Priority: Minor
> Fix For: 0.5
>
> Attachments: OODT-426.2012-03-20.cas-crawler.patch.txt, OODT-426.2012-03-20.cas-metadata.patch.txt, OODT-426.2012-03-24.cas-crawler.patch.txt, OODT-426.2012-04-03.cas-pge.txt
>
>
> The idea here is that CAS-Metadata will introduce a new NamingConvention interface, which will allow for renaming of files. CAS-Crawler will then be modified to support specified NamingConventions which will be run after all preconditions have passed for a given file. This will then allow CAS-PGE to then use AutoDetectProductCrawler instead of StdProductCrawler, which will standardize across the board for file extraction (currently CAS-PGE has it's own file extraction interface which uses regular expression to determine files which should be extracted and ingested). The only missing feature in CAS-Crawler which CAS-PGE supports is file renaming, which this new NamingConvention interface will introduce. Here is what the NamingConvention interface will look like:
> {code}
> public interface NamingConvention {
> public File rename(File file, Metadata metadata)
> throws NamingConventionException;
> }
> {code}
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (OODT-426) Introduce a CAS-Metadata based
renaming interface
Posted by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/OODT-426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13235416#comment-13235416 ]
jiraposter@reviews.apache.org commented on OODT-426:
----------------------------------------------------
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/4444/#review6215
-----------------------------------------------------------
looking good happy to review unit tests when they are there. LGTM dude you rule!
- Chris
On 2012-03-22 06:09:52, brian Foster wrote:
bq.
bq. -----------------------------------------------------------
bq. This is an automatically generated e-mail. To reply, visit:
bq. https://reviews.apache.org/r/4444/
bq. -----------------------------------------------------------
bq.
bq. (Updated 2012-03-22 06:09:52)
bq.
bq.
bq. Review request for oodt, Chris Mattmann, Ricky Nguyen, Paul Ramirez, and Thomas Bennett.
bq.
bq.
bq. Summary
bq. -------
bq.
bq. - Introduced NamingConvention support, which for MetExtractorProductCrawler now takes an ID for the NamingConvention to use and AutoDetectProductCrawler has a new element <namingConvention class=""/> in MimeExtractorRepo
bq. - Also cleaned up handleFile(File)... documented better and is now public and returns the IngestResult for what happened when called.
bq.
bq.
bq. This addresses bug OODT-426.
bq. https://issues.apache.org/jira/browse/OODT-426
bq.
bq.
bq. Diffs
bq. -----
bq.
bq. trunk/crawler/src/main/java/org/apache/oodt/cas/crawl/AutoDetectProductCrawler.java 1302790
bq. trunk/crawler/src/main/java/org/apache/oodt/cas/crawl/MetExtractorProductCrawler.java 1302790
bq. trunk/crawler/src/main/java/org/apache/oodt/cas/crawl/ProductCrawler.java 1302790
bq. trunk/crawler/src/main/java/org/apache/oodt/cas/crawl/StdProductCrawler.java 1302790
bq. trunk/crawler/src/main/java/org/apache/oodt/cas/crawl/typedetection/MimeExtractorConfigMetKeys.java 1302790
bq. trunk/crawler/src/main/java/org/apache/oodt/cas/crawl/typedetection/MimeExtractorConfigReader.java 1302790
bq. trunk/crawler/src/main/java/org/apache/oodt/cas/crawl/typedetection/MimeExtractorRepo.java 1302790
bq. trunk/crawler/src/main/resources/cmd-line-options.xml 1302790
bq. trunk/crawler/src/main/resources/crawler-config.xml 1302790
bq. trunk/crawler/src/main/resources/examples/mime-extractor-map.xml 1302790
bq. trunk/crawler/src/main/resources/naming-beans.xml PRE-CREATION
bq.
bq. Diff: https://reviews.apache.org/r/4444/diff
bq.
bq.
bq. Testing
bq. -------
bq.
bq. Still need to unit-test up cas-crawler
bq.
bq.
bq. Thanks,
bq.
bq. brian
bq.
bq.
> Introduce a CAS-Metadata based renaming interface
> -------------------------------------------------
>
> Key: OODT-426
> URL: https://issues.apache.org/jira/browse/OODT-426
> Project: OODT
> Issue Type: Sub-task
> Components: crawler, metadata container, pge wrapper framework
> Affects Versions: 0.3
> Environment: none
> Reporter: Brian Foster
> Assignee: Brian Foster
> Priority: Minor
> Fix For: 0.4
>
> Attachments: OODT-426.2012-03-20.cas-crawler.patch.txt, OODT-426.2012-03-20.cas-metadata.patch.txt
>
>
> The idea here is that CAS-Metadata will introduce a new NamingConvention interface, which will allow for renaming of files. CAS-Crawler will then be modified to support specified NamingConventions which will be run after all preconditions have passed for a given file. This will then allow CAS-PGE to then use AutoDetectProductCrawler instead of StdProductCrawler, which will standardize across the board for file extraction (currently CAS-PGE has it's own file extraction interface which uses regular expression to determine files which should be extracted and ingested). The only missing feature in CAS-Crawler which CAS-PGE supports is file renaming, which this new NamingConvention interface will introduce. Here is what the NamingConvention interface will look like:
> {code}
> public interface NamingConvention {
> public File rename(File file, Metadata metadata)
> throws NamingConventionException;
> }
> {code}
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (OODT-426) Introduce a CAS-Metadata based
renaming interface
Posted by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/OODT-426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13236432#comment-13236432 ]
jiraposter@reviews.apache.org commented on OODT-426:
----------------------------------------------------
bq. On 2012-03-22 07:15:57, Chris Mattmann wrote:
bq. > trunk/crawler/src/main/java/org/apache/oodt/cas/crawl/typedetection/MimeExtractorConfigReader.java, line 90
bq. > <https://reviews.apache.org/r/4444/diff/1/?file=94495#file94495line90>
bq. >
bq. > does this make it so that Exception is never thrown and that this the default?
This just allows you to specify a default... See other comment about exception
- brian
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/4444/#review6214
-----------------------------------------------------------
On 2012-03-22 06:09:52, brian Foster wrote:
bq.
bq. -----------------------------------------------------------
bq. This is an automatically generated e-mail. To reply, visit:
bq. https://reviews.apache.org/r/4444/
bq. -----------------------------------------------------------
bq.
bq. (Updated 2012-03-22 06:09:52)
bq.
bq.
bq. Review request for oodt, Chris Mattmann, Ricky Nguyen, Paul Ramirez, and Thomas Bennett.
bq.
bq.
bq. Summary
bq. -------
bq.
bq. - Introduced NamingConvention support, which for MetExtractorProductCrawler now takes an ID for the NamingConvention to use and AutoDetectProductCrawler has a new element <namingConvention class=""/> in MimeExtractorRepo
bq. - Also cleaned up handleFile(File)... documented better and is now public and returns the IngestResult for what happened when called.
bq.
bq.
bq. This addresses bug OODT-426.
bq. https://issues.apache.org/jira/browse/OODT-426
bq.
bq.
bq. Diffs
bq. -----
bq.
bq. trunk/crawler/src/main/java/org/apache/oodt/cas/crawl/AutoDetectProductCrawler.java 1302790
bq. trunk/crawler/src/main/java/org/apache/oodt/cas/crawl/MetExtractorProductCrawler.java 1302790
bq. trunk/crawler/src/main/java/org/apache/oodt/cas/crawl/ProductCrawler.java 1302790
bq. trunk/crawler/src/main/java/org/apache/oodt/cas/crawl/StdProductCrawler.java 1302790
bq. trunk/crawler/src/main/java/org/apache/oodt/cas/crawl/typedetection/MimeExtractorConfigMetKeys.java 1302790
bq. trunk/crawler/src/main/java/org/apache/oodt/cas/crawl/typedetection/MimeExtractorConfigReader.java 1302790
bq. trunk/crawler/src/main/java/org/apache/oodt/cas/crawl/typedetection/MimeExtractorRepo.java 1302790
bq. trunk/crawler/src/main/resources/cmd-line-options.xml 1302790
bq. trunk/crawler/src/main/resources/crawler-config.xml 1302790
bq. trunk/crawler/src/main/resources/examples/mime-extractor-map.xml 1302790
bq. trunk/crawler/src/main/resources/naming-beans.xml PRE-CREATION
bq.
bq. Diff: https://reviews.apache.org/r/4444/diff
bq.
bq.
bq. Testing
bq. -------
bq.
bq. Still need to unit-test up cas-crawler
bq.
bq.
bq. Thanks,
bq.
bq. brian
bq.
bq.
> Introduce a CAS-Metadata based renaming interface
> -------------------------------------------------
>
> Key: OODT-426
> URL: https://issues.apache.org/jira/browse/OODT-426
> Project: OODT
> Issue Type: Sub-task
> Components: crawler, metadata container, pge wrapper framework
> Affects Versions: 0.3
> Environment: none
> Reporter: Brian Foster
> Assignee: Brian Foster
> Priority: Minor
> Fix For: 0.4
>
> Attachments: OODT-426.2012-03-20.cas-crawler.patch.txt, OODT-426.2012-03-20.cas-metadata.patch.txt
>
>
> The idea here is that CAS-Metadata will introduce a new NamingConvention interface, which will allow for renaming of files. CAS-Crawler will then be modified to support specified NamingConventions which will be run after all preconditions have passed for a given file. This will then allow CAS-PGE to then use AutoDetectProductCrawler instead of StdProductCrawler, which will standardize across the board for file extraction (currently CAS-PGE has it's own file extraction interface which uses regular expression to determine files which should be extracted and ingested). The only missing feature in CAS-Crawler which CAS-PGE supports is file renaming, which this new NamingConvention interface will introduce. Here is what the NamingConvention interface will look like:
> {code}
> public interface NamingConvention {
> public File rename(File file, Metadata metadata)
> throws NamingConventionException;
> }
> {code}
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (OODT-426) Introduce a CAS-Metadata based renaming
interface
Posted by "Brian Foster (Updated) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/OODT-426?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Brian Foster updated OODT-426:
------------------------------
Attachment: OODT-426.2012-03-20.cas-metadata.patch.txt
- attached cas-metadata part of patch
> Introduce a CAS-Metadata based renaming interface
> -------------------------------------------------
>
> Key: OODT-426
> URL: https://issues.apache.org/jira/browse/OODT-426
> Project: OODT
> Issue Type: Sub-task
> Components: crawler, metadata container, pge wrapper framework
> Affects Versions: 0.3
> Environment: none
> Reporter: Brian Foster
> Assignee: Brian Foster
> Priority: Minor
> Fix For: 0.4
>
> Attachments: OODT-426.2012-03-20.cas-metadata.patch.txt
>
>
> The idea here is that CAS-Metadata will introduce a new NamingConvention interface, which will allow for renaming of files. CAS-Crawler will then be modified to support specified NamingConventions which will be run after all preconditions have passed for a given file. This will then allow CAS-PGE to then use AutoDetectProductCrawler instead of StdProductCrawler, which will standardize across the board for file extraction (currently CAS-PGE has it's own file extraction interface which uses regular expression to determine files which should be extracted and ingested). The only missing feature in CAS-Crawler which CAS-PGE supports is file renaming, which this new NamingConvention interface will introduce. Here is what the NamingConvention interface will look like:
> {code}
> public interface NamingConvention {
> public File rename(File file, Metadata metadata)
> throws NamingConventionException;
> }
> {code}
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (OODT-426) Introduce a CAS-Metadata based renaming
interface
Posted by "Brian Foster (Updated) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/OODT-426?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Brian Foster updated OODT-426:
------------------------------
Attachment: OODT-426.2012-03-20.cas-crawler.patch.txt
- attached cas-crawler part of patch
> Introduce a CAS-Metadata based renaming interface
> -------------------------------------------------
>
> Key: OODT-426
> URL: https://issues.apache.org/jira/browse/OODT-426
> Project: OODT
> Issue Type: Sub-task
> Components: crawler, metadata container, pge wrapper framework
> Affects Versions: 0.3
> Environment: none
> Reporter: Brian Foster
> Assignee: Brian Foster
> Priority: Minor
> Fix For: 0.4
>
> Attachments: OODT-426.2012-03-20.cas-crawler.patch.txt, OODT-426.2012-03-20.cas-metadata.patch.txt
>
>
> The idea here is that CAS-Metadata will introduce a new NamingConvention interface, which will allow for renaming of files. CAS-Crawler will then be modified to support specified NamingConventions which will be run after all preconditions have passed for a given file. This will then allow CAS-PGE to then use AutoDetectProductCrawler instead of StdProductCrawler, which will standardize across the board for file extraction (currently CAS-PGE has it's own file extraction interface which uses regular expression to determine files which should be extracted and ingested). The only missing feature in CAS-Crawler which CAS-PGE supports is file renaming, which this new NamingConvention interface will introduce. Here is what the NamingConvention interface will look like:
> {code}
> public interface NamingConvention {
> public File rename(File file, Metadata metadata)
> throws NamingConventionException;
> }
> {code}
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (OODT-426) Introduce a CAS-Metadata based
renaming interface
Posted by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/OODT-426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13238502#comment-13238502 ]
jiraposter@reviews.apache.org commented on OODT-426:
----------------------------------------------------
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/4444/#review6349
-----------------------------------------------------------
Ship it!
My comments are pretty minor, but check em' out. LGTM.
trunk/crawler/src/main/java/org/apache/oodt/cas/crawl/ProductCrawler.java
<https://reviews.apache.org/r/4444/#comment13774>
Are all of these @VisibleForTesting coupling our test system too much with the code? Just wondering...
trunk/crawler/src/main/java/org/apache/oodt/cas/crawl/StdProductCrawler.java
<https://reviews.apache.org/r/4444/#comment13775>
Should we augment the ProductCrawler super class to declare this function as an abstract method since all sub class crawlers implement it?
- Chris
On 2012-03-25 01:55:32, brian Foster wrote:
bq.
bq. -----------------------------------------------------------
bq. This is an automatically generated e-mail. To reply, visit:
bq. https://reviews.apache.org/r/4444/
bq. -----------------------------------------------------------
bq.
bq. (Updated 2012-03-25 01:55:32)
bq.
bq.
bq. Review request for oodt, Chris Mattmann, Ricky Nguyen, Paul Ramirez, and Thomas Bennett.
bq.
bq.
bq. Summary
bq. -------
bq.
bq. - Introduced NamingConvention support, which for MetExtractorProductCrawler now takes an ID for the NamingConvention to use and AutoDetectProductCrawler has a new element <namingConvention class=""/> in MimeExtractorRepo
bq. - Also cleaned up handleFile(File)... documented better and is now public and returns the IngestResult for what happened when called.
bq.
bq.
bq. This addresses bug OODT-426.
bq. https://issues.apache.org/jira/browse/OODT-426
bq.
bq.
bq. Diffs
bq. -----
bq.
bq. trunk/crawler/src/main/resources/naming-beans.xml PRE-CREATION
bq. trunk/crawler/src/test/org/apache/oodt/cas/crawl/TestProductCrawler.java PRE-CREATION
bq. trunk/crawler/src/main/resources/cmd-line-options.xml 1302790
bq. trunk/crawler/src/main/resources/crawler-config.xml 1302790
bq. trunk/crawler/src/main/resources/examples/mime-extractor-map.xml 1302790
bq. trunk/crawler/src/main/java/org/apache/oodt/cas/crawl/typedetection/MimeExtractorRepo.java 1302790
bq. trunk/crawler/src/main/java/org/apache/oodt/cas/crawl/StdProductCrawler.java 1302790
bq. trunk/crawler/src/main/java/org/apache/oodt/cas/crawl/typedetection/MimeExtractorConfigMetKeys.java 1302790
bq. trunk/crawler/src/main/java/org/apache/oodt/cas/crawl/typedetection/MimeExtractorConfigReader.java 1302790
bq. trunk/crawler/src/main/java/org/apache/oodt/cas/crawl/ProductCrawler.java 1302790
bq. trunk/crawler/pom.xml 1302790
bq. trunk/crawler/src/main/java/org/apache/oodt/cas/crawl/AutoDetectProductCrawler.java 1302790
bq. trunk/crawler/src/main/java/org/apache/oodt/cas/crawl/MetExtractorProductCrawler.java 1302790
bq.
bq. Diff: https://reviews.apache.org/r/4444/diff
bq.
bq.
bq. Testing
bq. -------
bq.
bq. Still need to unit-test up cas-crawler
bq.
bq.
bq. Thanks,
bq.
bq. brian
bq.
bq.
> Introduce a CAS-Metadata based renaming interface
> -------------------------------------------------
>
> Key: OODT-426
> URL: https://issues.apache.org/jira/browse/OODT-426
> Project: OODT
> Issue Type: Sub-task
> Components: crawler, metadata container, pge wrapper framework
> Affects Versions: 0.3
> Environment: none
> Reporter: Brian Foster
> Assignee: Brian Foster
> Priority: Minor
> Fix For: 0.4
>
> Attachments: OODT-426.2012-03-20.cas-crawler.patch.txt, OODT-426.2012-03-20.cas-metadata.patch.txt, OODT-426.2012-03-24.cas-crawler.patch.txt
>
>
> The idea here is that CAS-Metadata will introduce a new NamingConvention interface, which will allow for renaming of files. CAS-Crawler will then be modified to support specified NamingConventions which will be run after all preconditions have passed for a given file. This will then allow CAS-PGE to then use AutoDetectProductCrawler instead of StdProductCrawler, which will standardize across the board for file extraction (currently CAS-PGE has it's own file extraction interface which uses regular expression to determine files which should be extracted and ingested). The only missing feature in CAS-Crawler which CAS-PGE supports is file renaming, which this new NamingConvention interface will introduce. Here is what the NamingConvention interface will look like:
> {code}
> public interface NamingConvention {
> public File rename(File file, Metadata metadata)
> throws NamingConventionException;
> }
> {code}
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira