You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@oodt.apache.org by "Brian Foster (Created) (JIRA)" <ji...@apache.org> on 2012/03/20 09:03:50 UTC

[jira] [Created] (OODT-426) Introduce a CAS-Metadata based renaming interface

Introduce a CAS-Metadata based renaming interface
-------------------------------------------------

                 Key: OODT-426
                 URL: https://issues.apache.org/jira/browse/OODT-426
             Project: OODT
          Issue Type: Sub-task
          Components: crawler, metadata container, pge wrapper framework
    Affects Versions: 0.3
         Environment: none
            Reporter: Brian Foster
            Assignee: Brian Foster
            Priority: Minor
             Fix For: 0.4


The idea here is that CAS-Metadata will introduce a new NamingConvention interface, which will allow for renaming of files.  CAS-Crawler will then be modified to support specified NamingConventions which will be run after all preconditions have passed for a given file.  This will then allow CAS-PGE to then use AutoDetectProductCrawler instead of StdProductCrawler, which will standardize across the board for file extraction (currently CAS-PGE has it's own file extraction interface which uses regular expression to determine files which should be extracted and ingested). The only missing feature in CAS-Crawler which CAS-PGE supports is file renaming, which this new NamingConvention interface will introduce.  Here is what the NamingConvention interface will look like:

{code}
public interface NamingConvention {

   public File rename(File file, Metadata metadata)
         throws NamingConventionException;
}
{code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (OODT-426) Introduce a CAS-Metadata based renaming interface

Posted by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/OODT-426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13247973#comment-13247973 ] 

jiraposter@reviews.apache.org commented on OODT-426:
----------------------------------------------------



bq.  On 2012-04-04 18:34:56, Chris Mattmann wrote:
bq.  > trunk/pge/src/main/java/org/apache/oodt/cas/pge/PGETaskInstance.java, line 151
bq.  > <https://reviews.apache.org/r/4628/diff/1/?file=98791#file98791line151>
bq.  >
bq.  >     this seems like an ancillary change to this patch. However, it's a useful functionality so I don't feel strongly about separating it out. Just be wary of stuff like this (b/c as it grows) it can take away from the purpose of the patch ;)

ya... thought that too when i was making the change... but i was writing the unit-test for a method that was using it so i just fixed it right now so i don't have to rewrite the unit-test later


bq.  On 2012-04-04 18:34:56, Chris Mattmann wrote:
bq.  > trunk/pge/src/main/java/org/apache/oodt/cas/pge/config/OutputDir.java, line 49
bq.  > <https://reviews.apache.org/r/4628/diff/1/?file=98792#file98792line49>
bq.  >
bq.  >     +like

ack


- brian


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/4628/#review6686
-----------------------------------------------------------


On 2012-04-06 02:16:10, brian Foster wrote:
bq.  
bq.  -----------------------------------------------------------
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/4628/
bq.  -----------------------------------------------------------
bq.  
bq.  (Updated 2012-04-06 02:16:10)
bq.  
bq.  
bq.  Review request for oodt, Chris Mattmann, Ricky Nguyen, Paul Ramirez, and Thomas Bennett.
bq.  
bq.  
bq.  Summary
bq.  -------
bq.  
bq.  CAS-PGE Changes to this issue...
bq.  - Renaming and Metadata extraction removed from CAS-PGE and instead CAS-PGE now uses AutoDetectProductCrawler instead of StdProductCrawler
bq.  
bq.  
bq.  This addresses bug OODT-426.
bq.      https://issues.apache.org/jira/browse/OODT-426
bq.  
bq.  
bq.  Diffs
bq.  -----
bq.  
bq.    trunk/pge/src/main/resources/examples/Crawler/naming-beans.xml PRE-CREATION 
bq.    trunk/pge/src/main/resources/examples/Crawler/precondition-beans.xml PRE-CREATION 
bq.    trunk/pge/src/main/resources/examples/MetadataOutputFiles/metadata-output.xml 1302648 
bq.    trunk/pge/src/main/resources/examples/PgeConfigFiles/pge-config.xml 1302648 
bq.    trunk/pge/src/test/org/apache/oodt/cas/pge/TestPGETaskInstance.java 1302781 
bq.    trunk/pge/src/main/resources/examples/Crawler/mime-types.xml PRE-CREATION 
bq.    trunk/pge/src/main/resources/examples/Crawler/mime-extractor-map.xml PRE-CREATION 
bq.    trunk/pge/src/main/resources/examples/Crawler/crawler-config.xml PRE-CREATION 
bq.    trunk/pge/src/main/resources/examples/Crawler/filename.extractor.config.xml PRE-CREATION 
bq.    trunk/pge/src/main/resources/examples/Crawler/action-beans.xml PRE-CREATION 
bq.    trunk/pge/src/main/resources/examples/Crawler/action-beans.properties PRE-CREATION 
bq.    trunk/pge/src/main/java/org/apache/oodt/cas/pge/writers/metlist/MetadataListPcsMetFileWriter.java 1302648 
bq.    trunk/pge/src/main/java/org/apache/oodt/cas/pge/writers/xslt/XslTransformWriter.java 1302648 
bq.    trunk/pge/src/main/java/org/apache/oodt/cas/pge/writers/SciPgeConfigFileWriter.java 1302648 
bq.    trunk/pge/src/main/java/org/apache/oodt/cas/pge/config/XmlFilePgeConfigBuilder.java 1302648 
bq.    trunk/pge/src/main/java/org/apache/oodt/cas/pge/metadata/PgeTaskMetKeys.java 1302648 
bq.    trunk/pge/src/main/java/org/apache/oodt/cas/pge/writers/ExternExtractorMetWriter.java 1302648 
bq.    trunk/pge/src/main/java/org/apache/oodt/cas/pge/writers/FilenameExtractorWriter.java 1302648 
bq.    trunk/pge/src/main/java/org/apache/oodt/cas/pge/writers/PcsMetFileWriter.java 1302648 
bq.    trunk/pge/src/main/java/org/apache/oodt/cas/pge/config/RenamingConv.java 1302648 
bq.    trunk/pge/src/main/java/org/apache/oodt/cas/pge/config/PgeConfigBuilder.java 1302648 
bq.    trunk/pge/src/main/java/org/apache/oodt/cas/pge/config/PgeConfigMetKeys.java 1302648 
bq.    trunk/pge/src/main/java/org/apache/oodt/cas/pge/config/RegExprOutputFiles.java 1302648 
bq.    trunk/pge/src/main/java/org/apache/oodt/cas/pge/config/OutputDir.java 1302648 
bq.    trunk/pge/src/main/java/org/apache/oodt/cas/pge/config/PgeConfig.java 1302648 
bq.    trunk/pge/pom.xml 1302648 
bq.    trunk/pge/src/main/java/org/apache/oodt/cas/pge/PGETaskInstance.java 1302648 
bq.  
bq.  Diff: https://reviews.apache.org/r/4628/diff
bq.  
bq.  
bq.  Testing
bq.  -------
bq.  
bq.  Several Unit-tests
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  brian
bq.  
bq.


                
> Introduce a CAS-Metadata based renaming interface
> -------------------------------------------------
>
>                 Key: OODT-426
>                 URL: https://issues.apache.org/jira/browse/OODT-426
>             Project: OODT
>          Issue Type: Sub-task
>          Components: crawler, metadata container, pge wrapper framework
>    Affects Versions: 0.3
>         Environment: none
>            Reporter: Brian Foster
>            Assignee: Brian Foster
>            Priority: Minor
>             Fix For: 0.5
>
>         Attachments: OODT-426.2012-03-20.cas-crawler.patch.txt, OODT-426.2012-03-20.cas-metadata.patch.txt, OODT-426.2012-03-24.cas-crawler.patch.txt, OODT-426.2012-04-03.cas-pge.txt
>
>
> The idea here is that CAS-Metadata will introduce a new NamingConvention interface, which will allow for renaming of files.  CAS-Crawler will then be modified to support specified NamingConventions which will be run after all preconditions have passed for a given file.  This will then allow CAS-PGE to then use AutoDetectProductCrawler instead of StdProductCrawler, which will standardize across the board for file extraction (currently CAS-PGE has it's own file extraction interface which uses regular expression to determine files which should be extracted and ingested). The only missing feature in CAS-Crawler which CAS-PGE supports is file renaming, which this new NamingConvention interface will introduce.  Here is what the NamingConvention interface will look like:
> {code}
> public interface NamingConvention {
>    public File rename(File file, Metadata metadata)
>          throws NamingConventionException;
> }
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (OODT-426) Introduce a CAS-Metadata based renaming interface

Posted by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/OODT-426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13245796#comment-13245796 ] 

jiraposter@reviews.apache.org commented on OODT-426:
----------------------------------------------------


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/4628/
-----------------------------------------------------------

Review request for oodt, Chris Mattmann, Ricky Nguyen, Paul Ramirez, and Thomas Bennett.


Summary
-------

CAS-PGE Changes to this issue...
- Renaming and Metadata extraction removed from CAS-PGE and instead CAS-PGE now uses AutoDetectProductCrawler instead of StdProductCrawler


This addresses bug OODT-426.
    https://issues.apache.org/jira/browse/OODT-426


Diffs
-----

  trunk/pge/pom.xml 1302648 
  trunk/pge/src/main/java/org/apache/oodt/cas/pge/PGETaskInstance.java 1302648 
  trunk/pge/src/main/java/org/apache/oodt/cas/pge/config/OutputDir.java 1302648 
  trunk/pge/src/main/java/org/apache/oodt/cas/pge/config/PgeConfig.java 1302648 
  trunk/pge/src/main/java/org/apache/oodt/cas/pge/config/PgeConfigBuilder.java 1302648 
  trunk/pge/src/main/java/org/apache/oodt/cas/pge/config/PgeConfigMetKeys.java 1302648 
  trunk/pge/src/main/java/org/apache/oodt/cas/pge/config/RegExprOutputFiles.java 1302648 
  trunk/pge/src/main/java/org/apache/oodt/cas/pge/config/RenamingConv.java 1302648 
  trunk/pge/src/main/java/org/apache/oodt/cas/pge/config/XmlFilePgeConfigBuilder.java 1302648 
  trunk/pge/src/main/java/org/apache/oodt/cas/pge/metadata/PgeTaskMetKeys.java 1302648 
  trunk/pge/src/main/java/org/apache/oodt/cas/pge/writers/ExternExtractorMetWriter.java 1302648 
  trunk/pge/src/main/java/org/apache/oodt/cas/pge/writers/FilenameExtractorWriter.java 1302648 
  trunk/pge/src/main/java/org/apache/oodt/cas/pge/writers/PcsMetFileWriter.java 1302648 
  trunk/pge/src/main/java/org/apache/oodt/cas/pge/writers/SciPgeConfigFileWriter.java 1302648 
  trunk/pge/src/main/java/org/apache/oodt/cas/pge/writers/metlist/MetadataListPcsMetFileWriter.java 1302648 
  trunk/pge/src/main/java/org/apache/oodt/cas/pge/writers/xslt/XslTransformWriter.java 1302648 
  trunk/pge/src/main/resources/examples/Crawler/action-beans.xml PRE-CREATION 
  trunk/pge/src/main/resources/examples/Crawler/crawler-config.xml PRE-CREATION 
  trunk/pge/src/main/resources/examples/Crawler/mime-extractor-map.xml PRE-CREATION 
  trunk/pge/src/main/resources/examples/Crawler/mime-types.xml PRE-CREATION 
  trunk/pge/src/main/resources/examples/Crawler/naming-beans.xml PRE-CREATION 
  trunk/pge/src/main/resources/examples/Crawler/precondition-beans.xml PRE-CREATION 
  trunk/pge/src/main/resources/examples/MetadataOutputFiles/metadata-output.xml 1302648 
  trunk/pge/src/main/resources/examples/PgeConfigFiles/pge-config.xml 1302648 
  trunk/pge/src/test/org/apache/oodt/cas/pge/TestPGETaskInstance.java 1302781 

Diff: https://reviews.apache.org/r/4628/diff


Testing
-------

Several Unit-tests


Thanks,

brian


                
> Introduce a CAS-Metadata based renaming interface
> -------------------------------------------------
>
>                 Key: OODT-426
>                 URL: https://issues.apache.org/jira/browse/OODT-426
>             Project: OODT
>          Issue Type: Sub-task
>          Components: crawler, metadata container, pge wrapper framework
>    Affects Versions: 0.3
>         Environment: none
>            Reporter: Brian Foster
>            Assignee: Brian Foster
>            Priority: Minor
>             Fix For: 0.5
>
>         Attachments: OODT-426.2012-03-20.cas-crawler.patch.txt, OODT-426.2012-03-20.cas-metadata.patch.txt, OODT-426.2012-03-24.cas-crawler.patch.txt, OODT-426.2012-04-03.cas-pge.txt
>
>
> The idea here is that CAS-Metadata will introduce a new NamingConvention interface, which will allow for renaming of files.  CAS-Crawler will then be modified to support specified NamingConventions which will be run after all preconditions have passed for a given file.  This will then allow CAS-PGE to then use AutoDetectProductCrawler instead of StdProductCrawler, which will standardize across the board for file extraction (currently CAS-PGE has it's own file extraction interface which uses regular expression to determine files which should be extracted and ingested). The only missing feature in CAS-Crawler which CAS-PGE supports is file renaming, which this new NamingConvention interface will introduce.  Here is what the NamingConvention interface will look like:
> {code}
> public interface NamingConvention {
>    public File rename(File file, Metadata metadata)
>          throws NamingConventionException;
> }
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (OODT-426) Introduce a CAS-Metadata based renaming interface

Posted by "Chris A. Mattmann (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/OODT-426?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Chris A. Mattmann updated OODT-426:
-----------------------------------

    Fix Version/s:     (was: 0.5)
                   0.4
    
> Introduce a CAS-Metadata based renaming interface
> -------------------------------------------------
>
>                 Key: OODT-426
>                 URL: https://issues.apache.org/jira/browse/OODT-426
>             Project: OODT
>          Issue Type: Sub-task
>          Components: crawler, metadata container, pge wrapper framework
>    Affects Versions: 0.3
>         Environment: none
>            Reporter: Brian Foster
>            Assignee: Brian Foster
>            Priority: Minor
>             Fix For: 0.4
>
>         Attachments: OODT-426.2012-03-20.cas-crawler.patch.txt, OODT-426.2012-03-20.cas-metadata.patch.txt, OODT-426.2012-03-24.cas-crawler.patch.txt, OODT-426.2012-04-03.cas-pge.txt
>
>
> The idea here is that CAS-Metadata will introduce a new NamingConvention interface, which will allow for renaming of files.  CAS-Crawler will then be modified to support specified NamingConventions which will be run after all preconditions have passed for a given file.  This will then allow CAS-PGE to then use AutoDetectProductCrawler instead of StdProductCrawler, which will standardize across the board for file extraction (currently CAS-PGE has it's own file extraction interface which uses regular expression to determine files which should be extracted and ingested). The only missing feature in CAS-Crawler which CAS-PGE supports is file renaming, which this new NamingConvention interface will introduce.  Here is what the NamingConvention interface will look like:
> {code}
> public interface NamingConvention {
>    public File rename(File file, Metadata metadata)
>          throws NamingConventionException;
> }
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (OODT-426) Introduce a CAS-Metadata based renaming interface

Posted by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/OODT-426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13239091#comment-13239091 ] 

jiraposter@reviews.apache.org commented on OODT-426:
----------------------------------------------------


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/4444/#review6391
-----------------------------------------------------------

Ship it!


LGTM!

- Chris


On 2012-03-27 00:47:30, brian Foster wrote:
bq.  
bq.  -----------------------------------------------------------
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/4444/
bq.  -----------------------------------------------------------
bq.  
bq.  (Updated 2012-03-27 00:47:30)
bq.  
bq.  
bq.  Review request for oodt, Chris Mattmann, Ricky Nguyen, Paul Ramirez, and Thomas Bennett.
bq.  
bq.  
bq.  Summary
bq.  -------
bq.  
bq.  - Introduced NamingConvention support, which for MetExtractorProductCrawler now takes an ID for the NamingConvention to use and AutoDetectProductCrawler has a new element <namingConvention class=""/> in MimeExtractorRepo
bq.  - Also cleaned up handleFile(File)... documented better and is now public and returns the IngestResult for what happened when called.
bq.  
bq.  
bq.  This addresses bug OODT-426.
bq.      https://issues.apache.org/jira/browse/OODT-426
bq.  
bq.  
bq.  Diffs
bq.  -----
bq.  
bq.    trunk/crawler/src/main/java/org/apache/oodt/cas/crawl/ProductCrawler.java 1305657 
bq.    trunk/crawler/src/main/java/org/apache/oodt/cas/crawl/MetExtractorProductCrawler.java 1305657 
bq.    trunk/crawler/pom.xml 1305657 
bq.    trunk/crawler/src/main/java/org/apache/oodt/cas/crawl/AutoDetectProductCrawler.java 1305657 
bq.    trunk/crawler/src/main/java/org/apache/oodt/cas/crawl/StdProductCrawler.java 1305657 
bq.    trunk/crawler/src/main/java/org/apache/oodt/cas/crawl/typedetection/MetExtractorSpec.java 1305657 
bq.    trunk/crawler/src/main/java/org/apache/oodt/cas/crawl/typedetection/MimeExtractorConfigMetKeys.java 1305657 
bq.    trunk/crawler/src/main/java/org/apache/oodt/cas/crawl/typedetection/MimeExtractorConfigReader.java 1305657 
bq.    trunk/crawler/src/main/java/org/apache/oodt/cas/crawl/typedetection/MimeExtractorRepo.java 1305657 
bq.    trunk/crawler/src/main/resources/cmd-line-options.xml 1305657 
bq.    trunk/crawler/src/main/resources/crawler-config.xml 1305657 
bq.    trunk/crawler/src/main/resources/examples/mime-extractor-map.xml 1305657 
bq.    trunk/crawler/src/main/resources/naming-beans.xml PRE-CREATION 
bq.    trunk/crawler/src/test/org/apache/oodt/cas/crawl/StateAwareProductCrawler.java PRE-CREATION 
bq.    trunk/crawler/src/test/org/apache/oodt/cas/crawl/TestProductCrawler.java PRE-CREATION 
bq.    trunk/crawler/src/test/org/apache/oodt/cas/crawl/typedetection/TestMimeExtractorConfigReader.java PRE-CREATION 
bq.  
bq.  Diff: https://reviews.apache.org/r/4444/diff
bq.  
bq.  
bq.  Testing
bq.  -------
bq.  
bq.  Wrote several unit-tests
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  brian
bq.  
bq.


                
> Introduce a CAS-Metadata based renaming interface
> -------------------------------------------------
>
>                 Key: OODT-426
>                 URL: https://issues.apache.org/jira/browse/OODT-426
>             Project: OODT
>          Issue Type: Sub-task
>          Components: crawler, metadata container, pge wrapper framework
>    Affects Versions: 0.3
>         Environment: none
>            Reporter: Brian Foster
>            Assignee: Brian Foster
>            Priority: Minor
>             Fix For: 0.4
>
>         Attachments: OODT-426.2012-03-20.cas-crawler.patch.txt, OODT-426.2012-03-20.cas-metadata.patch.txt, OODT-426.2012-03-24.cas-crawler.patch.txt
>
>
> The idea here is that CAS-Metadata will introduce a new NamingConvention interface, which will allow for renaming of files.  CAS-Crawler will then be modified to support specified NamingConventions which will be run after all preconditions have passed for a given file.  This will then allow CAS-PGE to then use AutoDetectProductCrawler instead of StdProductCrawler, which will standardize across the board for file extraction (currently CAS-PGE has it's own file extraction interface which uses regular expression to determine files which should be extracted and ingested). The only missing feature in CAS-Crawler which CAS-PGE supports is file renaming, which this new NamingConvention interface will introduce.  Here is what the NamingConvention interface will look like:
> {code}
> public interface NamingConvention {
>    public File rename(File file, Metadata metadata)
>          throws NamingConventionException;
> }
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Resolved] (OODT-426) Introduce a CAS-Metadata based renaming interface

Posted by "Brian Foster (Resolved) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/OODT-426?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Brian Foster resolved OODT-426.
-------------------------------

    Resolution: Fixed

- cas-pge fix in r1311492
                
> Introduce a CAS-Metadata based renaming interface
> -------------------------------------------------
>
>                 Key: OODT-426
>                 URL: https://issues.apache.org/jira/browse/OODT-426
>             Project: OODT
>          Issue Type: Sub-task
>          Components: crawler, metadata container, pge wrapper framework
>    Affects Versions: 0.3
>         Environment: none
>            Reporter: Brian Foster
>            Assignee: Brian Foster
>            Priority: Minor
>             Fix For: 0.5
>
>         Attachments: OODT-426.2012-03-20.cas-crawler.patch.txt, OODT-426.2012-03-20.cas-metadata.patch.txt, OODT-426.2012-03-24.cas-crawler.patch.txt, OODT-426.2012-04-03.cas-pge.txt
>
>
> The idea here is that CAS-Metadata will introduce a new NamingConvention interface, which will allow for renaming of files.  CAS-Crawler will then be modified to support specified NamingConventions which will be run after all preconditions have passed for a given file.  This will then allow CAS-PGE to then use AutoDetectProductCrawler instead of StdProductCrawler, which will standardize across the board for file extraction (currently CAS-PGE has it's own file extraction interface which uses regular expression to determine files which should be extracted and ingested). The only missing feature in CAS-Crawler which CAS-PGE supports is file renaming, which this new NamingConvention interface will introduce.  Here is what the NamingConvention interface will look like:
> {code}
> public interface NamingConvention {
>    public File rename(File file, Metadata metadata)
>          throws NamingConventionException;
> }
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (OODT-426) Introduce a CAS-Metadata based renaming interface

Posted by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/OODT-426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13234151#comment-13234151 ] 

jiraposter@reviews.apache.org commented on OODT-426:
----------------------------------------------------


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/4412/#review6165
-----------------------------------------------------------

Ship it!


- Chris


On 2012-03-20 08:06:42, brian Foster wrote:
bq.  
bq.  -----------------------------------------------------------
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/4412/
bq.  -----------------------------------------------------------
bq.  
bq.  (Updated 2012-03-20 08:06:42)
bq.  
bq.  
bq.  Review request for oodt, Chris Mattmann, Ricky Nguyen, Paul Ramirez, and Thomas Bennett.
bq.  
bq.  
bq.  Summary
bq.  -------
bq.  
bq.  This is the CAS-Metadata part of this issue
bq.  
bq.  
bq.  This addresses bug OODT-426.
bq.      https://issues.apache.org/jira/browse/OODT-426
bq.  
bq.  
bq.  Diffs
bq.  -----
bq.  
bq.    trunk/metadata/src/main/java/org/apache/oodt/cas/metadata/exceptions/NamingConventionException.java PRE-CREATION 
bq.    trunk/metadata/src/main/java/org/apache/oodt/cas/metadata/filenaming/NamingConvention.java PRE-CREATION 
bq.    trunk/metadata/src/main/java/org/apache/oodt/cas/metadata/filenaming/PathUtilsNamingConvention.java PRE-CREATION 
bq.  
bq.  Diff: https://reviews.apache.org/r/4412/diff
bq.  
bq.  
bq.  Testing
bq.  -------
bq.  
bq.  Still need to add unit-test for PathUtilsNamingConvention
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  brian
bq.  
bq.


                
> Introduce a CAS-Metadata based renaming interface
> -------------------------------------------------
>
>                 Key: OODT-426
>                 URL: https://issues.apache.org/jira/browse/OODT-426
>             Project: OODT
>          Issue Type: Sub-task
>          Components: crawler, metadata container, pge wrapper framework
>    Affects Versions: 0.3
>         Environment: none
>            Reporter: Brian Foster
>            Assignee: Brian Foster
>            Priority: Minor
>             Fix For: 0.4
>
>         Attachments: OODT-426.2012-03-20.cas-metadata.patch.txt
>
>
> The idea here is that CAS-Metadata will introduce a new NamingConvention interface, which will allow for renaming of files.  CAS-Crawler will then be modified to support specified NamingConventions which will be run after all preconditions have passed for a given file.  This will then allow CAS-PGE to then use AutoDetectProductCrawler instead of StdProductCrawler, which will standardize across the board for file extraction (currently CAS-PGE has it's own file extraction interface which uses regular expression to determine files which should be extracted and ingested). The only missing feature in CAS-Crawler which CAS-PGE supports is file renaming, which this new NamingConvention interface will introduce.  Here is what the NamingConvention interface will look like:
> {code}
> public interface NamingConvention {
>    public File rename(File file, Metadata metadata)
>          throws NamingConventionException;
> }
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (OODT-426) Introduce a CAS-Metadata based renaming interface

Posted by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/OODT-426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13236431#comment-13236431 ] 

jiraposter@reviews.apache.org commented on OODT-426:
----------------------------------------------------


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/4444/#review6280
-----------------------------------------------------------



trunk/crawler/src/main/java/org/apache/oodt/cas/crawl/MetExtractorProductCrawler.java
<https://reviews.apache.org/r/4444/#comment13623>

    Nope, it's backwards compatible... This exception is only thrown if you specify a naming convention ID which does not exist... If you don't specify the ID... which is what the peeps don't specify... Then the code will never reach this IF statement... The IF statement it is contained in will be false


- brian


On 2012-03-22 06:09:52, brian Foster wrote:
bq.  
bq.  -----------------------------------------------------------
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/4444/
bq.  -----------------------------------------------------------
bq.  
bq.  (Updated 2012-03-22 06:09:52)
bq.  
bq.  
bq.  Review request for oodt, Chris Mattmann, Ricky Nguyen, Paul Ramirez, and Thomas Bennett.
bq.  
bq.  
bq.  Summary
bq.  -------
bq.  
bq.  - Introduced NamingConvention support, which for MetExtractorProductCrawler now takes an ID for the NamingConvention to use and AutoDetectProductCrawler has a new element <namingConvention class=""/> in MimeExtractorRepo
bq.  - Also cleaned up handleFile(File)... documented better and is now public and returns the IngestResult for what happened when called.
bq.  
bq.  
bq.  This addresses bug OODT-426.
bq.      https://issues.apache.org/jira/browse/OODT-426
bq.  
bq.  
bq.  Diffs
bq.  -----
bq.  
bq.    trunk/crawler/src/main/java/org/apache/oodt/cas/crawl/AutoDetectProductCrawler.java 1302790 
bq.    trunk/crawler/src/main/java/org/apache/oodt/cas/crawl/MetExtractorProductCrawler.java 1302790 
bq.    trunk/crawler/src/main/java/org/apache/oodt/cas/crawl/ProductCrawler.java 1302790 
bq.    trunk/crawler/src/main/java/org/apache/oodt/cas/crawl/StdProductCrawler.java 1302790 
bq.    trunk/crawler/src/main/java/org/apache/oodt/cas/crawl/typedetection/MimeExtractorConfigMetKeys.java 1302790 
bq.    trunk/crawler/src/main/java/org/apache/oodt/cas/crawl/typedetection/MimeExtractorConfigReader.java 1302790 
bq.    trunk/crawler/src/main/java/org/apache/oodt/cas/crawl/typedetection/MimeExtractorRepo.java 1302790 
bq.    trunk/crawler/src/main/resources/cmd-line-options.xml 1302790 
bq.    trunk/crawler/src/main/resources/crawler-config.xml 1302790 
bq.    trunk/crawler/src/main/resources/examples/mime-extractor-map.xml 1302790 
bq.    trunk/crawler/src/main/resources/naming-beans.xml PRE-CREATION 
bq.  
bq.  Diff: https://reviews.apache.org/r/4444/diff
bq.  
bq.  
bq.  Testing
bq.  -------
bq.  
bq.  Still need to unit-test up cas-crawler
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  brian
bq.  
bq.


                
> Introduce a CAS-Metadata based renaming interface
> -------------------------------------------------
>
>                 Key: OODT-426
>                 URL: https://issues.apache.org/jira/browse/OODT-426
>             Project: OODT
>          Issue Type: Sub-task
>          Components: crawler, metadata container, pge wrapper framework
>    Affects Versions: 0.3
>         Environment: none
>            Reporter: Brian Foster
>            Assignee: Brian Foster
>            Priority: Minor
>             Fix For: 0.4
>
>         Attachments: OODT-426.2012-03-20.cas-crawler.patch.txt, OODT-426.2012-03-20.cas-metadata.patch.txt
>
>
> The idea here is that CAS-Metadata will introduce a new NamingConvention interface, which will allow for renaming of files.  CAS-Crawler will then be modified to support specified NamingConventions which will be run after all preconditions have passed for a given file.  This will then allow CAS-PGE to then use AutoDetectProductCrawler instead of StdProductCrawler, which will standardize across the board for file extraction (currently CAS-PGE has it's own file extraction interface which uses regular expression to determine files which should be extracted and ingested). The only missing feature in CAS-Crawler which CAS-PGE supports is file renaming, which this new NamingConvention interface will introduce.  Here is what the NamingConvention interface will look like:
> {code}
> public interface NamingConvention {
>    public File rename(File file, Metadata metadata)
>          throws NamingConventionException;
> }
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (OODT-426) Introduce a CAS-Metadata based renaming interface

Posted by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/OODT-426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13233292#comment-13233292 ] 

jiraposter@reviews.apache.org commented on OODT-426:
----------------------------------------------------


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/4412/
-----------------------------------------------------------

Review request for oodt, Chris Mattmann, Ricky Nguyen, Paul Ramirez, and Thomas Bennett.


Summary
-------

This is the CAS-Metadata part of this issue


This addresses bug OODT-426.
    https://issues.apache.org/jira/browse/OODT-426


Diffs
-----

  trunk/metadata/src/main/java/org/apache/oodt/cas/metadata/exceptions/NamingConventionException.java PRE-CREATION 
  trunk/metadata/src/main/java/org/apache/oodt/cas/metadata/filenaming/NamingConvention.java PRE-CREATION 
  trunk/metadata/src/main/java/org/apache/oodt/cas/metadata/filenaming/PathUtilsNamingConvention.java PRE-CREATION 

Diff: https://reviews.apache.org/r/4412/diff


Testing
-------

Still need to add unit-test for PathUtilsNamingConvention


Thanks,

brian


                
> Introduce a CAS-Metadata based renaming interface
> -------------------------------------------------
>
>                 Key: OODT-426
>                 URL: https://issues.apache.org/jira/browse/OODT-426
>             Project: OODT
>          Issue Type: Sub-task
>          Components: crawler, metadata container, pge wrapper framework
>    Affects Versions: 0.3
>         Environment: none
>            Reporter: Brian Foster
>            Assignee: Brian Foster
>            Priority: Minor
>             Fix For: 0.4
>
>         Attachments: OODT-426.2012-03-20.cas-metadata.patch.txt
>
>
> The idea here is that CAS-Metadata will introduce a new NamingConvention interface, which will allow for renaming of files.  CAS-Crawler will then be modified to support specified NamingConventions which will be run after all preconditions have passed for a given file.  This will then allow CAS-PGE to then use AutoDetectProductCrawler instead of StdProductCrawler, which will standardize across the board for file extraction (currently CAS-PGE has it's own file extraction interface which uses regular expression to determine files which should be extracted and ingested). The only missing feature in CAS-Crawler which CAS-PGE supports is file renaming, which this new NamingConvention interface will introduce.  Here is what the NamingConvention interface will look like:
> {code}
> public interface NamingConvention {
>    public File rename(File file, Metadata metadata)
>          throws NamingConventionException;
> }
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (OODT-426) Introduce a CAS-Metadata based renaming interface

Posted by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/OODT-426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13248126#comment-13248126 ] 

jiraposter@reviews.apache.org commented on OODT-426:
----------------------------------------------------


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/4628/#review6734
-----------------------------------------------------------

Ship it!


- Chris


On 2012-04-06 02:16:10, brian Foster wrote:
bq.  
bq.  -----------------------------------------------------------
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/4628/
bq.  -----------------------------------------------------------
bq.  
bq.  (Updated 2012-04-06 02:16:10)
bq.  
bq.  
bq.  Review request for oodt, Chris Mattmann, Ricky Nguyen, Paul Ramirez, and Thomas Bennett.
bq.  
bq.  
bq.  Summary
bq.  -------
bq.  
bq.  CAS-PGE Changes to this issue...
bq.  - Renaming and Metadata extraction removed from CAS-PGE and instead CAS-PGE now uses AutoDetectProductCrawler instead of StdProductCrawler
bq.  
bq.  
bq.  This addresses bug OODT-426.
bq.      https://issues.apache.org/jira/browse/OODT-426
bq.  
bq.  
bq.  Diffs
bq.  -----
bq.  
bq.    trunk/pge/src/main/resources/examples/Crawler/naming-beans.xml PRE-CREATION 
bq.    trunk/pge/src/main/resources/examples/Crawler/precondition-beans.xml PRE-CREATION 
bq.    trunk/pge/src/main/resources/examples/MetadataOutputFiles/metadata-output.xml 1302648 
bq.    trunk/pge/src/main/resources/examples/PgeConfigFiles/pge-config.xml 1302648 
bq.    trunk/pge/src/test/org/apache/oodt/cas/pge/TestPGETaskInstance.java 1302781 
bq.    trunk/pge/src/main/resources/examples/Crawler/mime-types.xml PRE-CREATION 
bq.    trunk/pge/src/main/resources/examples/Crawler/mime-extractor-map.xml PRE-CREATION 
bq.    trunk/pge/src/main/resources/examples/Crawler/crawler-config.xml PRE-CREATION 
bq.    trunk/pge/src/main/resources/examples/Crawler/filename.extractor.config.xml PRE-CREATION 
bq.    trunk/pge/src/main/resources/examples/Crawler/action-beans.xml PRE-CREATION 
bq.    trunk/pge/src/main/resources/examples/Crawler/action-beans.properties PRE-CREATION 
bq.    trunk/pge/src/main/java/org/apache/oodt/cas/pge/writers/metlist/MetadataListPcsMetFileWriter.java 1302648 
bq.    trunk/pge/src/main/java/org/apache/oodt/cas/pge/writers/xslt/XslTransformWriter.java 1302648 
bq.    trunk/pge/src/main/java/org/apache/oodt/cas/pge/writers/SciPgeConfigFileWriter.java 1302648 
bq.    trunk/pge/src/main/java/org/apache/oodt/cas/pge/config/XmlFilePgeConfigBuilder.java 1302648 
bq.    trunk/pge/src/main/java/org/apache/oodt/cas/pge/metadata/PgeTaskMetKeys.java 1302648 
bq.    trunk/pge/src/main/java/org/apache/oodt/cas/pge/writers/ExternExtractorMetWriter.java 1302648 
bq.    trunk/pge/src/main/java/org/apache/oodt/cas/pge/writers/FilenameExtractorWriter.java 1302648 
bq.    trunk/pge/src/main/java/org/apache/oodt/cas/pge/writers/PcsMetFileWriter.java 1302648 
bq.    trunk/pge/src/main/java/org/apache/oodt/cas/pge/config/RenamingConv.java 1302648 
bq.    trunk/pge/src/main/java/org/apache/oodt/cas/pge/config/PgeConfigBuilder.java 1302648 
bq.    trunk/pge/src/main/java/org/apache/oodt/cas/pge/config/PgeConfigMetKeys.java 1302648 
bq.    trunk/pge/src/main/java/org/apache/oodt/cas/pge/config/RegExprOutputFiles.java 1302648 
bq.    trunk/pge/src/main/java/org/apache/oodt/cas/pge/config/OutputDir.java 1302648 
bq.    trunk/pge/src/main/java/org/apache/oodt/cas/pge/config/PgeConfig.java 1302648 
bq.    trunk/pge/pom.xml 1302648 
bq.    trunk/pge/src/main/java/org/apache/oodt/cas/pge/PGETaskInstance.java 1302648 
bq.  
bq.  Diff: https://reviews.apache.org/r/4628/diff
bq.  
bq.  
bq.  Testing
bq.  -------
bq.  
bq.  Several Unit-tests
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  brian
bq.  
bq.


                
> Introduce a CAS-Metadata based renaming interface
> -------------------------------------------------
>
>                 Key: OODT-426
>                 URL: https://issues.apache.org/jira/browse/OODT-426
>             Project: OODT
>          Issue Type: Sub-task
>          Components: crawler, metadata container, pge wrapper framework
>    Affects Versions: 0.3
>         Environment: none
>            Reporter: Brian Foster
>            Assignee: Brian Foster
>            Priority: Minor
>             Fix For: 0.5
>
>         Attachments: OODT-426.2012-03-20.cas-crawler.patch.txt, OODT-426.2012-03-20.cas-metadata.patch.txt, OODT-426.2012-03-24.cas-crawler.patch.txt, OODT-426.2012-04-03.cas-pge.txt
>
>
> The idea here is that CAS-Metadata will introduce a new NamingConvention interface, which will allow for renaming of files.  CAS-Crawler will then be modified to support specified NamingConventions which will be run after all preconditions have passed for a given file.  This will then allow CAS-PGE to then use AutoDetectProductCrawler instead of StdProductCrawler, which will standardize across the board for file extraction (currently CAS-PGE has it's own file extraction interface which uses regular expression to determine files which should be extracted and ingested). The only missing feature in CAS-Crawler which CAS-PGE supports is file renaming, which this new NamingConvention interface will introduce.  Here is what the NamingConvention interface will look like:
> {code}
> public interface NamingConvention {
>    public File rename(File file, Metadata metadata)
>          throws NamingConventionException;
> }
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (OODT-426) Introduce a CAS-Metadata based renaming interface

Posted by "Brian Foster (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/OODT-426?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Brian Foster updated OODT-426:
------------------------------

    Attachment: OODT-426.2012-03-24.cas-crawler.patch.txt

- updated with some unit-tests... almost done
                
> Introduce a CAS-Metadata based renaming interface
> -------------------------------------------------
>
>                 Key: OODT-426
>                 URL: https://issues.apache.org/jira/browse/OODT-426
>             Project: OODT
>          Issue Type: Sub-task
>          Components: crawler, metadata container, pge wrapper framework
>    Affects Versions: 0.3
>         Environment: none
>            Reporter: Brian Foster
>            Assignee: Brian Foster
>            Priority: Minor
>             Fix For: 0.4
>
>         Attachments: OODT-426.2012-03-20.cas-crawler.patch.txt, OODT-426.2012-03-20.cas-metadata.patch.txt, OODT-426.2012-03-24.cas-crawler.patch.txt
>
>
> The idea here is that CAS-Metadata will introduce a new NamingConvention interface, which will allow for renaming of files.  CAS-Crawler will then be modified to support specified NamingConventions which will be run after all preconditions have passed for a given file.  This will then allow CAS-PGE to then use AutoDetectProductCrawler instead of StdProductCrawler, which will standardize across the board for file extraction (currently CAS-PGE has it's own file extraction interface which uses regular expression to determine files which should be extracted and ingested). The only missing feature in CAS-Crawler which CAS-PGE supports is file renaming, which this new NamingConvention interface will introduce.  Here is what the NamingConvention interface will look like:
> {code}
> public interface NamingConvention {
>    public File rename(File file, Metadata metadata)
>          throws NamingConventionException;
> }
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (OODT-426) Introduce a CAS-Metadata based renaming interface

Posted by "Brian Foster (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/OODT-426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13239258#comment-13239258 ] 

Brian Foster commented on OODT-426:
-----------------------------------

- fixed cas-metadata part of patch in r1305744
                
> Introduce a CAS-Metadata based renaming interface
> -------------------------------------------------
>
>                 Key: OODT-426
>                 URL: https://issues.apache.org/jira/browse/OODT-426
>             Project: OODT
>          Issue Type: Sub-task
>          Components: crawler, metadata container, pge wrapper framework
>    Affects Versions: 0.3
>         Environment: none
>            Reporter: Brian Foster
>            Assignee: Brian Foster
>            Priority: Minor
>             Fix For: 0.4
>
>         Attachments: OODT-426.2012-03-20.cas-crawler.patch.txt, OODT-426.2012-03-20.cas-metadata.patch.txt, OODT-426.2012-03-24.cas-crawler.patch.txt
>
>
> The idea here is that CAS-Metadata will introduce a new NamingConvention interface, which will allow for renaming of files.  CAS-Crawler will then be modified to support specified NamingConventions which will be run after all preconditions have passed for a given file.  This will then allow CAS-PGE to then use AutoDetectProductCrawler instead of StdProductCrawler, which will standardize across the board for file extraction (currently CAS-PGE has it's own file extraction interface which uses regular expression to determine files which should be extracted and ingested). The only missing feature in CAS-Crawler which CAS-PGE supports is file renaming, which this new NamingConvention interface will introduce.  Here is what the NamingConvention interface will look like:
> {code}
> public interface NamingConvention {
>    public File rename(File file, Metadata metadata)
>          throws NamingConventionException;
> }
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (OODT-426) Introduce a CAS-Metadata based renaming interface

Posted by "Brian Foster (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/OODT-426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13239259#comment-13239259 ] 

Brian Foster commented on OODT-426:
-----------------------------------

- fixed cas-crawler part of patch in r1305745
                
> Introduce a CAS-Metadata based renaming interface
> -------------------------------------------------
>
>                 Key: OODT-426
>                 URL: https://issues.apache.org/jira/browse/OODT-426
>             Project: OODT
>          Issue Type: Sub-task
>          Components: crawler, metadata container, pge wrapper framework
>    Affects Versions: 0.3
>         Environment: none
>            Reporter: Brian Foster
>            Assignee: Brian Foster
>            Priority: Minor
>             Fix For: 0.4
>
>         Attachments: OODT-426.2012-03-20.cas-crawler.patch.txt, OODT-426.2012-03-20.cas-metadata.patch.txt, OODT-426.2012-03-24.cas-crawler.patch.txt
>
>
> The idea here is that CAS-Metadata will introduce a new NamingConvention interface, which will allow for renaming of files.  CAS-Crawler will then be modified to support specified NamingConventions which will be run after all preconditions have passed for a given file.  This will then allow CAS-PGE to then use AutoDetectProductCrawler instead of StdProductCrawler, which will standardize across the board for file extraction (currently CAS-PGE has it's own file extraction interface which uses regular expression to determine files which should be extracted and ingested). The only missing feature in CAS-Crawler which CAS-PGE supports is file renaming, which this new NamingConvention interface will introduce.  Here is what the NamingConvention interface will look like:
> {code}
> public interface NamingConvention {
>    public File rename(File file, Metadata metadata)
>          throws NamingConventionException;
> }
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (OODT-426) Introduce a CAS-Metadata based renaming interface

Posted by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/OODT-426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13250236#comment-13250236 ] 

jiraposter@reviews.apache.org commented on OODT-426:
----------------------------------------------------



bq.  On 2012-04-04 02:12:41, Paul Ramirez wrote:
bq.  > trunk/pge/src/main/resources/examples/PgeConfigFiles/pge-config.xml, lines 42-43
bq.  > <https://reviews.apache.org/r/4628/diff/1/?file=98813#file98813line42>
bq.  >
bq.  >     Put these examples inside comment tags as they wouldn't work as they existed anyhow. Also putting a longer description in the comment would help (i.e. one or more of these is not as helpful as what it does functionally. Why did we remove the files tag? Is this no longer supported? If it is then I recommend putting it back in but commented out. 
bq.  >     
bq.  >     For instance, I'd expect that instead of metadata keys you want to set more of what will be done with that custom metadata would be of use. Also an example of multivalued metadata.
bq.  
bq.  brian Foster wrote:
bq.      Added a TODO at the top of this file... The reader for this file still needs to be updated... so when i update it i'll make this file a working example when i write the unit-tests for it

Also the file tags are no longer supported... use AutoDetectProductCrawler configuration now to specify which files in the outputDirs should be ingested


- brian


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/4628/#review6670
-----------------------------------------------------------


On 2012-04-06 02:16:10, brian Foster wrote:
bq.  
bq.  -----------------------------------------------------------
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/4628/
bq.  -----------------------------------------------------------
bq.  
bq.  (Updated 2012-04-06 02:16:10)
bq.  
bq.  
bq.  Review request for oodt, Chris Mattmann, Ricky Nguyen, Paul Ramirez, and Thomas Bennett.
bq.  
bq.  
bq.  Summary
bq.  -------
bq.  
bq.  CAS-PGE Changes to this issue...
bq.  - Renaming and Metadata extraction removed from CAS-PGE and instead CAS-PGE now uses AutoDetectProductCrawler instead of StdProductCrawler
bq.  
bq.  
bq.  This addresses bug OODT-426.
bq.      https://issues.apache.org/jira/browse/OODT-426
bq.  
bq.  
bq.  Diffs
bq.  -----
bq.  
bq.    trunk/pge/src/main/resources/examples/Crawler/naming-beans.xml PRE-CREATION 
bq.    trunk/pge/src/main/resources/examples/Crawler/precondition-beans.xml PRE-CREATION 
bq.    trunk/pge/src/main/resources/examples/MetadataOutputFiles/metadata-output.xml 1302648 
bq.    trunk/pge/src/main/resources/examples/PgeConfigFiles/pge-config.xml 1302648 
bq.    trunk/pge/src/test/org/apache/oodt/cas/pge/TestPGETaskInstance.java 1302781 
bq.    trunk/pge/src/main/resources/examples/Crawler/mime-types.xml PRE-CREATION 
bq.    trunk/pge/src/main/resources/examples/Crawler/mime-extractor-map.xml PRE-CREATION 
bq.    trunk/pge/src/main/resources/examples/Crawler/crawler-config.xml PRE-CREATION 
bq.    trunk/pge/src/main/resources/examples/Crawler/filename.extractor.config.xml PRE-CREATION 
bq.    trunk/pge/src/main/resources/examples/Crawler/action-beans.xml PRE-CREATION 
bq.    trunk/pge/src/main/resources/examples/Crawler/action-beans.properties PRE-CREATION 
bq.    trunk/pge/src/main/java/org/apache/oodt/cas/pge/writers/metlist/MetadataListPcsMetFileWriter.java 1302648 
bq.    trunk/pge/src/main/java/org/apache/oodt/cas/pge/writers/xslt/XslTransformWriter.java 1302648 
bq.    trunk/pge/src/main/java/org/apache/oodt/cas/pge/writers/SciPgeConfigFileWriter.java 1302648 
bq.    trunk/pge/src/main/java/org/apache/oodt/cas/pge/config/XmlFilePgeConfigBuilder.java 1302648 
bq.    trunk/pge/src/main/java/org/apache/oodt/cas/pge/metadata/PgeTaskMetKeys.java 1302648 
bq.    trunk/pge/src/main/java/org/apache/oodt/cas/pge/writers/ExternExtractorMetWriter.java 1302648 
bq.    trunk/pge/src/main/java/org/apache/oodt/cas/pge/writers/FilenameExtractorWriter.java 1302648 
bq.    trunk/pge/src/main/java/org/apache/oodt/cas/pge/writers/PcsMetFileWriter.java 1302648 
bq.    trunk/pge/src/main/java/org/apache/oodt/cas/pge/config/RenamingConv.java 1302648 
bq.    trunk/pge/src/main/java/org/apache/oodt/cas/pge/config/PgeConfigBuilder.java 1302648 
bq.    trunk/pge/src/main/java/org/apache/oodt/cas/pge/config/PgeConfigMetKeys.java 1302648 
bq.    trunk/pge/src/main/java/org/apache/oodt/cas/pge/config/RegExprOutputFiles.java 1302648 
bq.    trunk/pge/src/main/java/org/apache/oodt/cas/pge/config/OutputDir.java 1302648 
bq.    trunk/pge/src/main/java/org/apache/oodt/cas/pge/config/PgeConfig.java 1302648 
bq.    trunk/pge/pom.xml 1302648 
bq.    trunk/pge/src/main/java/org/apache/oodt/cas/pge/PGETaskInstance.java 1302648 
bq.  
bq.  Diff: https://reviews.apache.org/r/4628/diff
bq.  
bq.  
bq.  Testing
bq.  -------
bq.  
bq.  Several Unit-tests
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  brian
bq.  
bq.


                
> Introduce a CAS-Metadata based renaming interface
> -------------------------------------------------
>
>                 Key: OODT-426
>                 URL: https://issues.apache.org/jira/browse/OODT-426
>             Project: OODT
>          Issue Type: Sub-task
>          Components: crawler, metadata container, pge wrapper framework
>    Affects Versions: 0.3
>         Environment: none
>            Reporter: Brian Foster
>            Assignee: Brian Foster
>            Priority: Minor
>             Fix For: 0.5
>
>         Attachments: OODT-426.2012-03-20.cas-crawler.patch.txt, OODT-426.2012-03-20.cas-metadata.patch.txt, OODT-426.2012-03-24.cas-crawler.patch.txt, OODT-426.2012-04-03.cas-pge.txt
>
>
> The idea here is that CAS-Metadata will introduce a new NamingConvention interface, which will allow for renaming of files.  CAS-Crawler will then be modified to support specified NamingConventions which will be run after all preconditions have passed for a given file.  This will then allow CAS-PGE to then use AutoDetectProductCrawler instead of StdProductCrawler, which will standardize across the board for file extraction (currently CAS-PGE has it's own file extraction interface which uses regular expression to determine files which should be extracted and ingested). The only missing feature in CAS-Crawler which CAS-PGE supports is file renaming, which this new NamingConvention interface will introduce.  Here is what the NamingConvention interface will look like:
> {code}
> public interface NamingConvention {
>    public File rename(File file, Metadata metadata)
>          throws NamingConventionException;
> }
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (OODT-426) Introduce a CAS-Metadata based renaming interface

Posted by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/OODT-426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13245965#comment-13245965 ] 

jiraposter@reviews.apache.org commented on OODT-426:
----------------------------------------------------


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/4628/#review6670
-----------------------------------------------------------



trunk/pge/src/main/resources/examples/Crawler/action-beans.xml
<https://reviews.apache.org/r/4628/#comment14432>

    I'd define these properties in another file and then include them here. This is only a suggestion and not a just but I see the properties as something that could likely be changed or set to a fixed value and if we factor it out of here we can keep people from touching this file too much. I think this file just makes peoples heads spin at first but the properties don't (i.e. it hides the Spring goodness in a good way).



trunk/pge/src/main/resources/examples/PgeConfigFiles/pge-config.xml
<https://reviews.apache.org/r/4628/#comment14431>

    Put these examples inside comment tags as they wouldn't work as they existed anyhow. Also putting a longer description in the comment would help (i.e. one or more of these is not as helpful as what it does functionally. Why did we remove the files tag? Is this no longer supported? If it is then I recommend putting it back in but commented out. 
    
    For instance, I'd expect that instead of metadata keys you want to set more of what will be done with that custom metadata would be of use. Also an example of multivalued metadata. 


- Paul


On 2012-04-03 21:56:17, brian Foster wrote:
bq.  
bq.  -----------------------------------------------------------
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/4628/
bq.  -----------------------------------------------------------
bq.  
bq.  (Updated 2012-04-03 21:56:17)
bq.  
bq.  
bq.  Review request for oodt, Chris Mattmann, Ricky Nguyen, Paul Ramirez, and Thomas Bennett.
bq.  
bq.  
bq.  Summary
bq.  -------
bq.  
bq.  CAS-PGE Changes to this issue...
bq.  - Renaming and Metadata extraction removed from CAS-PGE and instead CAS-PGE now uses AutoDetectProductCrawler instead of StdProductCrawler
bq.  
bq.  
bq.  This addresses bug OODT-426.
bq.      https://issues.apache.org/jira/browse/OODT-426
bq.  
bq.  
bq.  Diffs
bq.  -----
bq.  
bq.    trunk/pge/pom.xml 1302648 
bq.    trunk/pge/src/main/java/org/apache/oodt/cas/pge/PGETaskInstance.java 1302648 
bq.    trunk/pge/src/main/java/org/apache/oodt/cas/pge/config/OutputDir.java 1302648 
bq.    trunk/pge/src/main/java/org/apache/oodt/cas/pge/config/PgeConfig.java 1302648 
bq.    trunk/pge/src/main/java/org/apache/oodt/cas/pge/config/PgeConfigBuilder.java 1302648 
bq.    trunk/pge/src/main/java/org/apache/oodt/cas/pge/config/PgeConfigMetKeys.java 1302648 
bq.    trunk/pge/src/main/java/org/apache/oodt/cas/pge/config/RegExprOutputFiles.java 1302648 
bq.    trunk/pge/src/main/java/org/apache/oodt/cas/pge/config/RenamingConv.java 1302648 
bq.    trunk/pge/src/main/java/org/apache/oodt/cas/pge/config/XmlFilePgeConfigBuilder.java 1302648 
bq.    trunk/pge/src/main/java/org/apache/oodt/cas/pge/metadata/PgeTaskMetKeys.java 1302648 
bq.    trunk/pge/src/main/java/org/apache/oodt/cas/pge/writers/ExternExtractorMetWriter.java 1302648 
bq.    trunk/pge/src/main/java/org/apache/oodt/cas/pge/writers/FilenameExtractorWriter.java 1302648 
bq.    trunk/pge/src/main/java/org/apache/oodt/cas/pge/writers/PcsMetFileWriter.java 1302648 
bq.    trunk/pge/src/main/java/org/apache/oodt/cas/pge/writers/SciPgeConfigFileWriter.java 1302648 
bq.    trunk/pge/src/main/java/org/apache/oodt/cas/pge/writers/metlist/MetadataListPcsMetFileWriter.java 1302648 
bq.    trunk/pge/src/main/java/org/apache/oodt/cas/pge/writers/xslt/XslTransformWriter.java 1302648 
bq.    trunk/pge/src/main/resources/examples/Crawler/action-beans.xml PRE-CREATION 
bq.    trunk/pge/src/main/resources/examples/Crawler/crawler-config.xml PRE-CREATION 
bq.    trunk/pge/src/main/resources/examples/Crawler/mime-extractor-map.xml PRE-CREATION 
bq.    trunk/pge/src/main/resources/examples/Crawler/mime-types.xml PRE-CREATION 
bq.    trunk/pge/src/main/resources/examples/Crawler/naming-beans.xml PRE-CREATION 
bq.    trunk/pge/src/main/resources/examples/Crawler/precondition-beans.xml PRE-CREATION 
bq.    trunk/pge/src/main/resources/examples/MetadataOutputFiles/metadata-output.xml 1302648 
bq.    trunk/pge/src/main/resources/examples/PgeConfigFiles/pge-config.xml 1302648 
bq.    trunk/pge/src/test/org/apache/oodt/cas/pge/TestPGETaskInstance.java 1302781 
bq.  
bq.  Diff: https://reviews.apache.org/r/4628/diff
bq.  
bq.  
bq.  Testing
bq.  -------
bq.  
bq.  Several Unit-tests
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  brian
bq.  
bq.


                
> Introduce a CAS-Metadata based renaming interface
> -------------------------------------------------
>
>                 Key: OODT-426
>                 URL: https://issues.apache.org/jira/browse/OODT-426
>             Project: OODT
>          Issue Type: Sub-task
>          Components: crawler, metadata container, pge wrapper framework
>    Affects Versions: 0.3
>         Environment: none
>            Reporter: Brian Foster
>            Assignee: Brian Foster
>            Priority: Minor
>             Fix For: 0.5
>
>         Attachments: OODT-426.2012-03-20.cas-crawler.patch.txt, OODT-426.2012-03-20.cas-metadata.patch.txt, OODT-426.2012-03-24.cas-crawler.patch.txt, OODT-426.2012-04-03.cas-pge.txt
>
>
> The idea here is that CAS-Metadata will introduce a new NamingConvention interface, which will allow for renaming of files.  CAS-Crawler will then be modified to support specified NamingConventions which will be run after all preconditions have passed for a given file.  This will then allow CAS-PGE to then use AutoDetectProductCrawler instead of StdProductCrawler, which will standardize across the board for file extraction (currently CAS-PGE has it's own file extraction interface which uses regular expression to determine files which should be extracted and ingested). The only missing feature in CAS-Crawler which CAS-PGE supports is file renaming, which this new NamingConvention interface will introduce.  Here is what the NamingConvention interface will look like:
> {code}
> public interface NamingConvention {
>    public File rename(File file, Metadata metadata)
>          throws NamingConventionException;
> }
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (OODT-426) Introduce a CAS-Metadata based renaming interface

Posted by "Brian Foster (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/OODT-426?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Brian Foster updated OODT-426:
------------------------------

    Attachment: OODT-426.2012-04-03.cas-pge.txt

- attached patch (cas-pge changes)
                
> Introduce a CAS-Metadata based renaming interface
> -------------------------------------------------
>
>                 Key: OODT-426
>                 URL: https://issues.apache.org/jira/browse/OODT-426
>             Project: OODT
>          Issue Type: Sub-task
>          Components: crawler, metadata container, pge wrapper framework
>    Affects Versions: 0.3
>         Environment: none
>            Reporter: Brian Foster
>            Assignee: Brian Foster
>            Priority: Minor
>             Fix For: 0.5
>
>         Attachments: OODT-426.2012-03-20.cas-crawler.patch.txt, OODT-426.2012-03-20.cas-metadata.patch.txt, OODT-426.2012-03-24.cas-crawler.patch.txt, OODT-426.2012-04-03.cas-pge.txt
>
>
> The idea here is that CAS-Metadata will introduce a new NamingConvention interface, which will allow for renaming of files.  CAS-Crawler will then be modified to support specified NamingConventions which will be run after all preconditions have passed for a given file.  This will then allow CAS-PGE to then use AutoDetectProductCrawler instead of StdProductCrawler, which will standardize across the board for file extraction (currently CAS-PGE has it's own file extraction interface which uses regular expression to determine files which should be extracted and ingested). The only missing feature in CAS-Crawler which CAS-PGE supports is file renaming, which this new NamingConvention interface will introduce.  Here is what the NamingConvention interface will look like:
> {code}
> public interface NamingConvention {
>    public File rename(File file, Metadata metadata)
>          throws NamingConventionException;
> }
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (OODT-426) Introduce a CAS-Metadata based renaming interface

Posted by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/OODT-426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13235391#comment-13235391 ] 

jiraposter@reviews.apache.org commented on OODT-426:
----------------------------------------------------


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/4444/
-----------------------------------------------------------

Review request for oodt, Chris Mattmann, Ricky Nguyen, Paul Ramirez, and Thomas Bennett.


Summary
-------

- Introduced NamingConvention support, which for MetExtractorProductCrawler now takes an ID for the NamingConvention to use and AutoDetectProductCrawler has a new element <namingConvention class=""/> in MimeExtractorRepo
- Also cleaned up handleFile(File)... documented better and is now public and returns the IngestResult for what happened when called.


This addresses bug OODT-426.
    https://issues.apache.org/jira/browse/OODT-426


Diffs
-----

  trunk/crawler/src/main/java/org/apache/oodt/cas/crawl/AutoDetectProductCrawler.java 1302790 
  trunk/crawler/src/main/java/org/apache/oodt/cas/crawl/MetExtractorProductCrawler.java 1302790 
  trunk/crawler/src/main/java/org/apache/oodt/cas/crawl/ProductCrawler.java 1302790 
  trunk/crawler/src/main/java/org/apache/oodt/cas/crawl/StdProductCrawler.java 1302790 
  trunk/crawler/src/main/java/org/apache/oodt/cas/crawl/typedetection/MimeExtractorConfigMetKeys.java 1302790 
  trunk/crawler/src/main/java/org/apache/oodt/cas/crawl/typedetection/MimeExtractorConfigReader.java 1302790 
  trunk/crawler/src/main/java/org/apache/oodt/cas/crawl/typedetection/MimeExtractorRepo.java 1302790 
  trunk/crawler/src/main/resources/cmd-line-options.xml 1302790 
  trunk/crawler/src/main/resources/crawler-config.xml 1302790 
  trunk/crawler/src/main/resources/examples/mime-extractor-map.xml 1302790 
  trunk/crawler/src/main/resources/naming-beans.xml PRE-CREATION 

Diff: https://reviews.apache.org/r/4444/diff


Testing
-------

Still need to unit-test up cas-crawler


Thanks,

brian


                
> Introduce a CAS-Metadata based renaming interface
> -------------------------------------------------
>
>                 Key: OODT-426
>                 URL: https://issues.apache.org/jira/browse/OODT-426
>             Project: OODT
>          Issue Type: Sub-task
>          Components: crawler, metadata container, pge wrapper framework
>    Affects Versions: 0.3
>         Environment: none
>            Reporter: Brian Foster
>            Assignee: Brian Foster
>            Priority: Minor
>             Fix For: 0.4
>
>         Attachments: OODT-426.2012-03-20.cas-crawler.patch.txt, OODT-426.2012-03-20.cas-metadata.patch.txt
>
>
> The idea here is that CAS-Metadata will introduce a new NamingConvention interface, which will allow for renaming of files.  CAS-Crawler will then be modified to support specified NamingConventions which will be run after all preconditions have passed for a given file.  This will then allow CAS-PGE to then use AutoDetectProductCrawler instead of StdProductCrawler, which will standardize across the board for file extraction (currently CAS-PGE has it's own file extraction interface which uses regular expression to determine files which should be extracted and ingested). The only missing feature in CAS-Crawler which CAS-PGE supports is file renaming, which this new NamingConvention interface will introduce.  Here is what the NamingConvention interface will look like:
> {code}
> public interface NamingConvention {
>    public File rename(File file, Metadata metadata)
>          throws NamingConventionException;
> }
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (OODT-426) Introduce a CAS-Metadata based renaming interface

Posted by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/OODT-426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13247970#comment-13247970 ] 

jiraposter@reviews.apache.org commented on OODT-426:
----------------------------------------------------



bq.  On 2012-04-04 02:12:41, Paul Ramirez wrote:
bq.  > trunk/pge/src/main/resources/examples/Crawler/action-beans.xml, lines 29-37
bq.  > <https://reviews.apache.org/r/4628/diff/1/?file=98806#file98806line29>
bq.  >
bq.  >     I'd define these properties in another file and then include them here. This is only a suggestion and not a just but I see the properties as something that could likely be changed or set to a fixed value and if we factor it out of here we can keep people from touching this file too much. I think this file just makes peoples heads spin at first but the properties don't (i.e. it hides the Spring goodness in a good way).

done


bq.  On 2012-04-04 02:12:41, Paul Ramirez wrote:
bq.  > trunk/pge/src/main/resources/examples/PgeConfigFiles/pge-config.xml, lines 42-43
bq.  > <https://reviews.apache.org/r/4628/diff/1/?file=98813#file98813line42>
bq.  >
bq.  >     Put these examples inside comment tags as they wouldn't work as they existed anyhow. Also putting a longer description in the comment would help (i.e. one or more of these is not as helpful as what it does functionally. Why did we remove the files tag? Is this no longer supported? If it is then I recommend putting it back in but commented out. 
bq.  >     
bq.  >     For instance, I'd expect that instead of metadata keys you want to set more of what will be done with that custom metadata would be of use. Also an example of multivalued metadata.

Added a TODO at the top of this file... The reader for this file still needs to be updated... so when i update it i'll make this file a working example when i write the unit-tests for it


- brian


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/4628/#review6670
-----------------------------------------------------------


On 2012-04-03 21:56:17, brian Foster wrote:
bq.  
bq.  -----------------------------------------------------------
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/4628/
bq.  -----------------------------------------------------------
bq.  
bq.  (Updated 2012-04-03 21:56:17)
bq.  
bq.  
bq.  Review request for oodt, Chris Mattmann, Ricky Nguyen, Paul Ramirez, and Thomas Bennett.
bq.  
bq.  
bq.  Summary
bq.  -------
bq.  
bq.  CAS-PGE Changes to this issue...
bq.  - Renaming and Metadata extraction removed from CAS-PGE and instead CAS-PGE now uses AutoDetectProductCrawler instead of StdProductCrawler
bq.  
bq.  
bq.  This addresses bug OODT-426.
bq.      https://issues.apache.org/jira/browse/OODT-426
bq.  
bq.  
bq.  Diffs
bq.  -----
bq.  
bq.    trunk/pge/pom.xml 1302648 
bq.    trunk/pge/src/main/java/org/apache/oodt/cas/pge/PGETaskInstance.java 1302648 
bq.    trunk/pge/src/main/java/org/apache/oodt/cas/pge/config/OutputDir.java 1302648 
bq.    trunk/pge/src/main/java/org/apache/oodt/cas/pge/config/PgeConfig.java 1302648 
bq.    trunk/pge/src/main/java/org/apache/oodt/cas/pge/config/PgeConfigBuilder.java 1302648 
bq.    trunk/pge/src/main/java/org/apache/oodt/cas/pge/config/PgeConfigMetKeys.java 1302648 
bq.    trunk/pge/src/main/java/org/apache/oodt/cas/pge/config/RegExprOutputFiles.java 1302648 
bq.    trunk/pge/src/main/java/org/apache/oodt/cas/pge/config/RenamingConv.java 1302648 
bq.    trunk/pge/src/main/java/org/apache/oodt/cas/pge/config/XmlFilePgeConfigBuilder.java 1302648 
bq.    trunk/pge/src/main/java/org/apache/oodt/cas/pge/metadata/PgeTaskMetKeys.java 1302648 
bq.    trunk/pge/src/main/java/org/apache/oodt/cas/pge/writers/ExternExtractorMetWriter.java 1302648 
bq.    trunk/pge/src/main/java/org/apache/oodt/cas/pge/writers/FilenameExtractorWriter.java 1302648 
bq.    trunk/pge/src/main/java/org/apache/oodt/cas/pge/writers/PcsMetFileWriter.java 1302648 
bq.    trunk/pge/src/main/java/org/apache/oodt/cas/pge/writers/SciPgeConfigFileWriter.java 1302648 
bq.    trunk/pge/src/main/java/org/apache/oodt/cas/pge/writers/metlist/MetadataListPcsMetFileWriter.java 1302648 
bq.    trunk/pge/src/main/java/org/apache/oodt/cas/pge/writers/xslt/XslTransformWriter.java 1302648 
bq.    trunk/pge/src/main/resources/examples/Crawler/action-beans.xml PRE-CREATION 
bq.    trunk/pge/src/main/resources/examples/Crawler/crawler-config.xml PRE-CREATION 
bq.    trunk/pge/src/main/resources/examples/Crawler/mime-extractor-map.xml PRE-CREATION 
bq.    trunk/pge/src/main/resources/examples/Crawler/mime-types.xml PRE-CREATION 
bq.    trunk/pge/src/main/resources/examples/Crawler/naming-beans.xml PRE-CREATION 
bq.    trunk/pge/src/main/resources/examples/Crawler/precondition-beans.xml PRE-CREATION 
bq.    trunk/pge/src/main/resources/examples/MetadataOutputFiles/metadata-output.xml 1302648 
bq.    trunk/pge/src/main/resources/examples/PgeConfigFiles/pge-config.xml 1302648 
bq.    trunk/pge/src/test/org/apache/oodt/cas/pge/TestPGETaskInstance.java 1302781 
bq.  
bq.  Diff: https://reviews.apache.org/r/4628/diff
bq.  
bq.  
bq.  Testing
bq.  -------
bq.  
bq.  Several Unit-tests
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  brian
bq.  
bq.


                
> Introduce a CAS-Metadata based renaming interface
> -------------------------------------------------
>
>                 Key: OODT-426
>                 URL: https://issues.apache.org/jira/browse/OODT-426
>             Project: OODT
>          Issue Type: Sub-task
>          Components: crawler, metadata container, pge wrapper framework
>    Affects Versions: 0.3
>         Environment: none
>            Reporter: Brian Foster
>            Assignee: Brian Foster
>            Priority: Minor
>             Fix For: 0.5
>
>         Attachments: OODT-426.2012-03-20.cas-crawler.patch.txt, OODT-426.2012-03-20.cas-metadata.patch.txt, OODT-426.2012-03-24.cas-crawler.patch.txt, OODT-426.2012-04-03.cas-pge.txt
>
>
> The idea here is that CAS-Metadata will introduce a new NamingConvention interface, which will allow for renaming of files.  CAS-Crawler will then be modified to support specified NamingConventions which will be run after all preconditions have passed for a given file.  This will then allow CAS-PGE to then use AutoDetectProductCrawler instead of StdProductCrawler, which will standardize across the board for file extraction (currently CAS-PGE has it's own file extraction interface which uses regular expression to determine files which should be extracted and ingested). The only missing feature in CAS-Crawler which CAS-PGE supports is file renaming, which this new NamingConvention interface will introduce.  Here is what the NamingConvention interface will look like:
> {code}
> public interface NamingConvention {
>    public File rename(File file, Metadata metadata)
>          throws NamingConventionException;
> }
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (OODT-426) Introduce a CAS-Metadata based renaming interface

Posted by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/OODT-426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13246563#comment-13246563 ] 

jiraposter@reviews.apache.org commented on OODT-426:
----------------------------------------------------


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/4628/#review6686
-----------------------------------------------------------



trunk/pge/src/main/java/org/apache/oodt/cas/pge/PGETaskInstance.java
<https://reviews.apache.org/r/4628/#comment14489>

    this seems like an ancillary change to this patch. However, it's a useful functionality so I don't feel strongly about separating it out. Just be wary of stuff like this (b/c as it grows) it can take away from the purpose of the patch ;)



trunk/pge/src/main/java/org/apache/oodt/cas/pge/config/OutputDir.java
<https://reviews.apache.org/r/4628/#comment14496>

    +like


- Chris


On 2012-04-03 21:56:17, brian Foster wrote:
bq.  
bq.  -----------------------------------------------------------
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/4628/
bq.  -----------------------------------------------------------
bq.  
bq.  (Updated 2012-04-03 21:56:17)
bq.  
bq.  
bq.  Review request for oodt, Chris Mattmann, Ricky Nguyen, Paul Ramirez, and Thomas Bennett.
bq.  
bq.  
bq.  Summary
bq.  -------
bq.  
bq.  CAS-PGE Changes to this issue...
bq.  - Renaming and Metadata extraction removed from CAS-PGE and instead CAS-PGE now uses AutoDetectProductCrawler instead of StdProductCrawler
bq.  
bq.  
bq.  This addresses bug OODT-426.
bq.      https://issues.apache.org/jira/browse/OODT-426
bq.  
bq.  
bq.  Diffs
bq.  -----
bq.  
bq.    trunk/pge/pom.xml 1302648 
bq.    trunk/pge/src/main/java/org/apache/oodt/cas/pge/PGETaskInstance.java 1302648 
bq.    trunk/pge/src/main/java/org/apache/oodt/cas/pge/config/OutputDir.java 1302648 
bq.    trunk/pge/src/main/java/org/apache/oodt/cas/pge/config/PgeConfig.java 1302648 
bq.    trunk/pge/src/main/java/org/apache/oodt/cas/pge/config/PgeConfigBuilder.java 1302648 
bq.    trunk/pge/src/main/java/org/apache/oodt/cas/pge/config/PgeConfigMetKeys.java 1302648 
bq.    trunk/pge/src/main/java/org/apache/oodt/cas/pge/config/RegExprOutputFiles.java 1302648 
bq.    trunk/pge/src/main/java/org/apache/oodt/cas/pge/config/RenamingConv.java 1302648 
bq.    trunk/pge/src/main/java/org/apache/oodt/cas/pge/config/XmlFilePgeConfigBuilder.java 1302648 
bq.    trunk/pge/src/main/java/org/apache/oodt/cas/pge/metadata/PgeTaskMetKeys.java 1302648 
bq.    trunk/pge/src/main/java/org/apache/oodt/cas/pge/writers/ExternExtractorMetWriter.java 1302648 
bq.    trunk/pge/src/main/java/org/apache/oodt/cas/pge/writers/FilenameExtractorWriter.java 1302648 
bq.    trunk/pge/src/main/java/org/apache/oodt/cas/pge/writers/PcsMetFileWriter.java 1302648 
bq.    trunk/pge/src/main/java/org/apache/oodt/cas/pge/writers/SciPgeConfigFileWriter.java 1302648 
bq.    trunk/pge/src/main/java/org/apache/oodt/cas/pge/writers/metlist/MetadataListPcsMetFileWriter.java 1302648 
bq.    trunk/pge/src/main/java/org/apache/oodt/cas/pge/writers/xslt/XslTransformWriter.java 1302648 
bq.    trunk/pge/src/main/resources/examples/Crawler/action-beans.xml PRE-CREATION 
bq.    trunk/pge/src/main/resources/examples/Crawler/crawler-config.xml PRE-CREATION 
bq.    trunk/pge/src/main/resources/examples/Crawler/mime-extractor-map.xml PRE-CREATION 
bq.    trunk/pge/src/main/resources/examples/Crawler/mime-types.xml PRE-CREATION 
bq.    trunk/pge/src/main/resources/examples/Crawler/naming-beans.xml PRE-CREATION 
bq.    trunk/pge/src/main/resources/examples/Crawler/precondition-beans.xml PRE-CREATION 
bq.    trunk/pge/src/main/resources/examples/MetadataOutputFiles/metadata-output.xml 1302648 
bq.    trunk/pge/src/main/resources/examples/PgeConfigFiles/pge-config.xml 1302648 
bq.    trunk/pge/src/test/org/apache/oodt/cas/pge/TestPGETaskInstance.java 1302781 
bq.  
bq.  Diff: https://reviews.apache.org/r/4628/diff
bq.  
bq.  
bq.  Testing
bq.  -------
bq.  
bq.  Several Unit-tests
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  brian
bq.  
bq.


                
> Introduce a CAS-Metadata based renaming interface
> -------------------------------------------------
>
>                 Key: OODT-426
>                 URL: https://issues.apache.org/jira/browse/OODT-426
>             Project: OODT
>          Issue Type: Sub-task
>          Components: crawler, metadata container, pge wrapper framework
>    Affects Versions: 0.3
>         Environment: none
>            Reporter: Brian Foster
>            Assignee: Brian Foster
>            Priority: Minor
>             Fix For: 0.5
>
>         Attachments: OODT-426.2012-03-20.cas-crawler.patch.txt, OODT-426.2012-03-20.cas-metadata.patch.txt, OODT-426.2012-03-24.cas-crawler.patch.txt, OODT-426.2012-04-03.cas-pge.txt
>
>
> The idea here is that CAS-Metadata will introduce a new NamingConvention interface, which will allow for renaming of files.  CAS-Crawler will then be modified to support specified NamingConventions which will be run after all preconditions have passed for a given file.  This will then allow CAS-PGE to then use AutoDetectProductCrawler instead of StdProductCrawler, which will standardize across the board for file extraction (currently CAS-PGE has it's own file extraction interface which uses regular expression to determine files which should be extracted and ingested). The only missing feature in CAS-Crawler which CAS-PGE supports is file renaming, which this new NamingConvention interface will introduce.  Here is what the NamingConvention interface will look like:
> {code}
> public interface NamingConvention {
>    public File rename(File file, Metadata metadata)
>          throws NamingConventionException;
> }
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (OODT-426) Introduce a CAS-Metadata based renaming interface

Posted by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/OODT-426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13247971#comment-13247971 ] 

jiraposter@reviews.apache.org commented on OODT-426:
----------------------------------------------------


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/4628/
-----------------------------------------------------------

(Updated 2012-04-06 02:16:10.469275)


Review request for oodt, Chris Mattmann, Ricky Nguyen, Paul Ramirez, and Thomas Bennett.


Changes
-------

Updates per comments in reviews


Summary
-------

CAS-PGE Changes to this issue...
- Renaming and Metadata extraction removed from CAS-PGE and instead CAS-PGE now uses AutoDetectProductCrawler instead of StdProductCrawler


This addresses bug OODT-426.
    https://issues.apache.org/jira/browse/OODT-426


Diffs (updated)
-----

  trunk/pge/src/main/resources/examples/Crawler/naming-beans.xml PRE-CREATION 
  trunk/pge/src/main/resources/examples/Crawler/precondition-beans.xml PRE-CREATION 
  trunk/pge/src/main/resources/examples/MetadataOutputFiles/metadata-output.xml 1302648 
  trunk/pge/src/main/resources/examples/PgeConfigFiles/pge-config.xml 1302648 
  trunk/pge/src/test/org/apache/oodt/cas/pge/TestPGETaskInstance.java 1302781 
  trunk/pge/src/main/resources/examples/Crawler/mime-types.xml PRE-CREATION 
  trunk/pge/src/main/resources/examples/Crawler/mime-extractor-map.xml PRE-CREATION 
  trunk/pge/src/main/resources/examples/Crawler/crawler-config.xml PRE-CREATION 
  trunk/pge/src/main/resources/examples/Crawler/filename.extractor.config.xml PRE-CREATION 
  trunk/pge/src/main/resources/examples/Crawler/action-beans.xml PRE-CREATION 
  trunk/pge/src/main/resources/examples/Crawler/action-beans.properties PRE-CREATION 
  trunk/pge/src/main/java/org/apache/oodt/cas/pge/writers/metlist/MetadataListPcsMetFileWriter.java 1302648 
  trunk/pge/src/main/java/org/apache/oodt/cas/pge/writers/xslt/XslTransformWriter.java 1302648 
  trunk/pge/src/main/java/org/apache/oodt/cas/pge/writers/SciPgeConfigFileWriter.java 1302648 
  trunk/pge/src/main/java/org/apache/oodt/cas/pge/config/XmlFilePgeConfigBuilder.java 1302648 
  trunk/pge/src/main/java/org/apache/oodt/cas/pge/metadata/PgeTaskMetKeys.java 1302648 
  trunk/pge/src/main/java/org/apache/oodt/cas/pge/writers/ExternExtractorMetWriter.java 1302648 
  trunk/pge/src/main/java/org/apache/oodt/cas/pge/writers/FilenameExtractorWriter.java 1302648 
  trunk/pge/src/main/java/org/apache/oodt/cas/pge/writers/PcsMetFileWriter.java 1302648 
  trunk/pge/src/main/java/org/apache/oodt/cas/pge/config/RenamingConv.java 1302648 
  trunk/pge/src/main/java/org/apache/oodt/cas/pge/config/PgeConfigBuilder.java 1302648 
  trunk/pge/src/main/java/org/apache/oodt/cas/pge/config/PgeConfigMetKeys.java 1302648 
  trunk/pge/src/main/java/org/apache/oodt/cas/pge/config/RegExprOutputFiles.java 1302648 
  trunk/pge/src/main/java/org/apache/oodt/cas/pge/config/OutputDir.java 1302648 
  trunk/pge/src/main/java/org/apache/oodt/cas/pge/config/PgeConfig.java 1302648 
  trunk/pge/pom.xml 1302648 
  trunk/pge/src/main/java/org/apache/oodt/cas/pge/PGETaskInstance.java 1302648 

Diff: https://reviews.apache.org/r/4628/diff


Testing
-------

Several Unit-tests


Thanks,

brian


                
> Introduce a CAS-Metadata based renaming interface
> -------------------------------------------------
>
>                 Key: OODT-426
>                 URL: https://issues.apache.org/jira/browse/OODT-426
>             Project: OODT
>          Issue Type: Sub-task
>          Components: crawler, metadata container, pge wrapper framework
>    Affects Versions: 0.3
>         Environment: none
>            Reporter: Brian Foster
>            Assignee: Brian Foster
>            Priority: Minor
>             Fix For: 0.5
>
>         Attachments: OODT-426.2012-03-20.cas-crawler.patch.txt, OODT-426.2012-03-20.cas-metadata.patch.txt, OODT-426.2012-03-24.cas-crawler.patch.txt, OODT-426.2012-04-03.cas-pge.txt
>
>
> The idea here is that CAS-Metadata will introduce a new NamingConvention interface, which will allow for renaming of files.  CAS-Crawler will then be modified to support specified NamingConventions which will be run after all preconditions have passed for a given file.  This will then allow CAS-PGE to then use AutoDetectProductCrawler instead of StdProductCrawler, which will standardize across the board for file extraction (currently CAS-PGE has it's own file extraction interface which uses regular expression to determine files which should be extracted and ingested). The only missing feature in CAS-Crawler which CAS-PGE supports is file renaming, which this new NamingConvention interface will introduce.  Here is what the NamingConvention interface will look like:
> {code}
> public interface NamingConvention {
>    public File rename(File file, Metadata metadata)
>          throws NamingConventionException;
> }
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (OODT-426) Introduce a CAS-Metadata based renaming interface

Posted by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/OODT-426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13238787#comment-13238787 ] 

jiraposter@reviews.apache.org commented on OODT-426:
----------------------------------------------------



bq.  On 2012-03-26 15:50:17, Chris Mattmann wrote:
bq.  > trunk/crawler/src/main/java/org/apache/oodt/cas/crawl/ProductCrawler.java, line 326
bq.  > <https://reviews.apache.org/r/4444/diff/2/?file=95394#file95394line326>
bq.  >
bq.  >     Are all of these @VisibleForTesting coupling our test system too much with the code? Just wondering...

I just changing the visibility on these methods from private to package level... then i notated them with the @VisibleForTesting to then make this clear to other developers that the only reason for these methods being package level is to make them unit-test-able


bq.  On 2012-03-26 15:50:17, Chris Mattmann wrote:
bq.  > trunk/crawler/src/main/java/org/apache/oodt/cas/crawl/StdProductCrawler.java, line 71
bq.  > <https://reviews.apache.org/r/4444/diff/2/?file=95395#file95395line71>
bq.  >
bq.  >     Should we augment the ProductCrawler super class to declare this function as an abstract method since all sub class crawlers implement it?

it is... this method does need a @Override above it to make it clear... i'll add this


- brian


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/4444/#review6349
-----------------------------------------------------------


On 2012-03-25 01:55:32, brian Foster wrote:
bq.  
bq.  -----------------------------------------------------------
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/4444/
bq.  -----------------------------------------------------------
bq.  
bq.  (Updated 2012-03-25 01:55:32)
bq.  
bq.  
bq.  Review request for oodt, Chris Mattmann, Ricky Nguyen, Paul Ramirez, and Thomas Bennett.
bq.  
bq.  
bq.  Summary
bq.  -------
bq.  
bq.  - Introduced NamingConvention support, which for MetExtractorProductCrawler now takes an ID for the NamingConvention to use and AutoDetectProductCrawler has a new element <namingConvention class=""/> in MimeExtractorRepo
bq.  - Also cleaned up handleFile(File)... documented better and is now public and returns the IngestResult for what happened when called.
bq.  
bq.  
bq.  This addresses bug OODT-426.
bq.      https://issues.apache.org/jira/browse/OODT-426
bq.  
bq.  
bq.  Diffs
bq.  -----
bq.  
bq.    trunk/crawler/src/main/resources/naming-beans.xml PRE-CREATION 
bq.    trunk/crawler/src/test/org/apache/oodt/cas/crawl/TestProductCrawler.java PRE-CREATION 
bq.    trunk/crawler/src/main/resources/cmd-line-options.xml 1302790 
bq.    trunk/crawler/src/main/resources/crawler-config.xml 1302790 
bq.    trunk/crawler/src/main/resources/examples/mime-extractor-map.xml 1302790 
bq.    trunk/crawler/src/main/java/org/apache/oodt/cas/crawl/typedetection/MimeExtractorRepo.java 1302790 
bq.    trunk/crawler/src/main/java/org/apache/oodt/cas/crawl/StdProductCrawler.java 1302790 
bq.    trunk/crawler/src/main/java/org/apache/oodt/cas/crawl/typedetection/MimeExtractorConfigMetKeys.java 1302790 
bq.    trunk/crawler/src/main/java/org/apache/oodt/cas/crawl/typedetection/MimeExtractorConfigReader.java 1302790 
bq.    trunk/crawler/src/main/java/org/apache/oodt/cas/crawl/ProductCrawler.java 1302790 
bq.    trunk/crawler/pom.xml 1302790 
bq.    trunk/crawler/src/main/java/org/apache/oodt/cas/crawl/AutoDetectProductCrawler.java 1302790 
bq.    trunk/crawler/src/main/java/org/apache/oodt/cas/crawl/MetExtractorProductCrawler.java 1302790 
bq.  
bq.  Diff: https://reviews.apache.org/r/4444/diff
bq.  
bq.  
bq.  Testing
bq.  -------
bq.  
bq.  Still need to unit-test up cas-crawler
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  brian
bq.  
bq.


                
> Introduce a CAS-Metadata based renaming interface
> -------------------------------------------------
>
>                 Key: OODT-426
>                 URL: https://issues.apache.org/jira/browse/OODT-426
>             Project: OODT
>          Issue Type: Sub-task
>          Components: crawler, metadata container, pge wrapper framework
>    Affects Versions: 0.3
>         Environment: none
>            Reporter: Brian Foster
>            Assignee: Brian Foster
>            Priority: Minor
>             Fix For: 0.4
>
>         Attachments: OODT-426.2012-03-20.cas-crawler.patch.txt, OODT-426.2012-03-20.cas-metadata.patch.txt, OODT-426.2012-03-24.cas-crawler.patch.txt
>
>
> The idea here is that CAS-Metadata will introduce a new NamingConvention interface, which will allow for renaming of files.  CAS-Crawler will then be modified to support specified NamingConventions which will be run after all preconditions have passed for a given file.  This will then allow CAS-PGE to then use AutoDetectProductCrawler instead of StdProductCrawler, which will standardize across the board for file extraction (currently CAS-PGE has it's own file extraction interface which uses regular expression to determine files which should be extracted and ingested). The only missing feature in CAS-Crawler which CAS-PGE supports is file renaming, which this new NamingConvention interface will introduce.  Here is what the NamingConvention interface will look like:
> {code}
> public interface NamingConvention {
>    public File rename(File file, Metadata metadata)
>          throws NamingConventionException;
> }
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (OODT-426) Introduce a CAS-Metadata based renaming interface

Posted by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/OODT-426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13239086#comment-13239086 ] 

jiraposter@reviews.apache.org commented on OODT-426:
----------------------------------------------------


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/4444/#review6387
-----------------------------------------------------------



trunk/crawler/src/main/java/org/apache/oodt/cas/crawl/typedetection/MetExtractorSpec.java
<https://reviews.apache.org/r/4444/#comment13930>

    ignore this files changes... i removed them locally... was added these while debugging


- brian


On 2012-03-27 00:47:30, brian Foster wrote:
bq.  
bq.  -----------------------------------------------------------
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/4444/
bq.  -----------------------------------------------------------
bq.  
bq.  (Updated 2012-03-27 00:47:30)
bq.  
bq.  
bq.  Review request for oodt, Chris Mattmann, Ricky Nguyen, Paul Ramirez, and Thomas Bennett.
bq.  
bq.  
bq.  Summary
bq.  -------
bq.  
bq.  - Introduced NamingConvention support, which for MetExtractorProductCrawler now takes an ID for the NamingConvention to use and AutoDetectProductCrawler has a new element <namingConvention class=""/> in MimeExtractorRepo
bq.  - Also cleaned up handleFile(File)... documented better and is now public and returns the IngestResult for what happened when called.
bq.  
bq.  
bq.  This addresses bug OODT-426.
bq.      https://issues.apache.org/jira/browse/OODT-426
bq.  
bq.  
bq.  Diffs
bq.  -----
bq.  
bq.    trunk/crawler/src/main/java/org/apache/oodt/cas/crawl/ProductCrawler.java 1305657 
bq.    trunk/crawler/src/main/java/org/apache/oodt/cas/crawl/MetExtractorProductCrawler.java 1305657 
bq.    trunk/crawler/pom.xml 1305657 
bq.    trunk/crawler/src/main/java/org/apache/oodt/cas/crawl/AutoDetectProductCrawler.java 1305657 
bq.    trunk/crawler/src/main/java/org/apache/oodt/cas/crawl/StdProductCrawler.java 1305657 
bq.    trunk/crawler/src/main/java/org/apache/oodt/cas/crawl/typedetection/MetExtractorSpec.java 1305657 
bq.    trunk/crawler/src/main/java/org/apache/oodt/cas/crawl/typedetection/MimeExtractorConfigMetKeys.java 1305657 
bq.    trunk/crawler/src/main/java/org/apache/oodt/cas/crawl/typedetection/MimeExtractorConfigReader.java 1305657 
bq.    trunk/crawler/src/main/java/org/apache/oodt/cas/crawl/typedetection/MimeExtractorRepo.java 1305657 
bq.    trunk/crawler/src/main/resources/cmd-line-options.xml 1305657 
bq.    trunk/crawler/src/main/resources/crawler-config.xml 1305657 
bq.    trunk/crawler/src/main/resources/examples/mime-extractor-map.xml 1305657 
bq.    trunk/crawler/src/main/resources/naming-beans.xml PRE-CREATION 
bq.    trunk/crawler/src/test/org/apache/oodt/cas/crawl/StateAwareProductCrawler.java PRE-CREATION 
bq.    trunk/crawler/src/test/org/apache/oodt/cas/crawl/TestProductCrawler.java PRE-CREATION 
bq.    trunk/crawler/src/test/org/apache/oodt/cas/crawl/typedetection/TestMimeExtractorConfigReader.java PRE-CREATION 
bq.  
bq.  Diff: https://reviews.apache.org/r/4444/diff
bq.  
bq.  
bq.  Testing
bq.  -------
bq.  
bq.  Wrote several unit-tests
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  brian
bq.  
bq.


                
> Introduce a CAS-Metadata based renaming interface
> -------------------------------------------------
>
>                 Key: OODT-426
>                 URL: https://issues.apache.org/jira/browse/OODT-426
>             Project: OODT
>          Issue Type: Sub-task
>          Components: crawler, metadata container, pge wrapper framework
>    Affects Versions: 0.3
>         Environment: none
>            Reporter: Brian Foster
>            Assignee: Brian Foster
>            Priority: Minor
>             Fix For: 0.4
>
>         Attachments: OODT-426.2012-03-20.cas-crawler.patch.txt, OODT-426.2012-03-20.cas-metadata.patch.txt, OODT-426.2012-03-24.cas-crawler.patch.txt
>
>
> The idea here is that CAS-Metadata will introduce a new NamingConvention interface, which will allow for renaming of files.  CAS-Crawler will then be modified to support specified NamingConventions which will be run after all preconditions have passed for a given file.  This will then allow CAS-PGE to then use AutoDetectProductCrawler instead of StdProductCrawler, which will standardize across the board for file extraction (currently CAS-PGE has it's own file extraction interface which uses regular expression to determine files which should be extracted and ingested). The only missing feature in CAS-Crawler which CAS-PGE supports is file renaming, which this new NamingConvention interface will introduce.  Here is what the NamingConvention interface will look like:
> {code}
> public interface NamingConvention {
>    public File rename(File file, Metadata metadata)
>          throws NamingConventionException;
> }
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (OODT-426) Introduce a CAS-Metadata based renaming interface

Posted by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/OODT-426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13235417#comment-13235417 ] 

jiraposter@reviews.apache.org commented on OODT-426:
----------------------------------------------------


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/4444/#review6214
-----------------------------------------------------------



trunk/crawler/src/main/java/org/apache/oodt/cas/crawl/MetExtractorProductCrawler.java
<https://reviews.apache.org/r/4444/#comment13396>

    will this cause back incompat changes for peeps with default crawl scripts that use MetExtractorProductCrawler? If so, can we default it to something to appease that?



trunk/crawler/src/main/java/org/apache/oodt/cas/crawl/typedetection/MimeExtractorConfigReader.java
<https://reviews.apache.org/r/4444/#comment13397>

    good job adding this!



trunk/crawler/src/main/java/org/apache/oodt/cas/crawl/typedetection/MimeExtractorConfigReader.java
<https://reviews.apache.org/r/4444/#comment13398>

    does this make it so that Exception is never thrown and that this the default?



trunk/crawler/src/main/java/org/apache/oodt/cas/crawl/typedetection/MimeExtractorRepo.java
<https://reviews.apache.org/r/4444/#comment13399>

    same comment as above -- default naming convention ensures that exception is never thrown?


- Chris


On 2012-03-22 06:09:52, brian Foster wrote:
bq.  
bq.  -----------------------------------------------------------
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/4444/
bq.  -----------------------------------------------------------
bq.  
bq.  (Updated 2012-03-22 06:09:52)
bq.  
bq.  
bq.  Review request for oodt, Chris Mattmann, Ricky Nguyen, Paul Ramirez, and Thomas Bennett.
bq.  
bq.  
bq.  Summary
bq.  -------
bq.  
bq.  - Introduced NamingConvention support, which for MetExtractorProductCrawler now takes an ID for the NamingConvention to use and AutoDetectProductCrawler has a new element <namingConvention class=""/> in MimeExtractorRepo
bq.  - Also cleaned up handleFile(File)... documented better and is now public and returns the IngestResult for what happened when called.
bq.  
bq.  
bq.  This addresses bug OODT-426.
bq.      https://issues.apache.org/jira/browse/OODT-426
bq.  
bq.  
bq.  Diffs
bq.  -----
bq.  
bq.    trunk/crawler/src/main/java/org/apache/oodt/cas/crawl/AutoDetectProductCrawler.java 1302790 
bq.    trunk/crawler/src/main/java/org/apache/oodt/cas/crawl/MetExtractorProductCrawler.java 1302790 
bq.    trunk/crawler/src/main/java/org/apache/oodt/cas/crawl/ProductCrawler.java 1302790 
bq.    trunk/crawler/src/main/java/org/apache/oodt/cas/crawl/StdProductCrawler.java 1302790 
bq.    trunk/crawler/src/main/java/org/apache/oodt/cas/crawl/typedetection/MimeExtractorConfigMetKeys.java 1302790 
bq.    trunk/crawler/src/main/java/org/apache/oodt/cas/crawl/typedetection/MimeExtractorConfigReader.java 1302790 
bq.    trunk/crawler/src/main/java/org/apache/oodt/cas/crawl/typedetection/MimeExtractorRepo.java 1302790 
bq.    trunk/crawler/src/main/resources/cmd-line-options.xml 1302790 
bq.    trunk/crawler/src/main/resources/crawler-config.xml 1302790 
bq.    trunk/crawler/src/main/resources/examples/mime-extractor-map.xml 1302790 
bq.    trunk/crawler/src/main/resources/naming-beans.xml PRE-CREATION 
bq.  
bq.  Diff: https://reviews.apache.org/r/4444/diff
bq.  
bq.  
bq.  Testing
bq.  -------
bq.  
bq.  Still need to unit-test up cas-crawler
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  brian
bq.  
bq.


                
> Introduce a CAS-Metadata based renaming interface
> -------------------------------------------------
>
>                 Key: OODT-426
>                 URL: https://issues.apache.org/jira/browse/OODT-426
>             Project: OODT
>          Issue Type: Sub-task
>          Components: crawler, metadata container, pge wrapper framework
>    Affects Versions: 0.3
>         Environment: none
>            Reporter: Brian Foster
>            Assignee: Brian Foster
>            Priority: Minor
>             Fix For: 0.4
>
>         Attachments: OODT-426.2012-03-20.cas-crawler.patch.txt, OODT-426.2012-03-20.cas-metadata.patch.txt
>
>
> The idea here is that CAS-Metadata will introduce a new NamingConvention interface, which will allow for renaming of files.  CAS-Crawler will then be modified to support specified NamingConventions which will be run after all preconditions have passed for a given file.  This will then allow CAS-PGE to then use AutoDetectProductCrawler instead of StdProductCrawler, which will standardize across the board for file extraction (currently CAS-PGE has it's own file extraction interface which uses regular expression to determine files which should be extracted and ingested). The only missing feature in CAS-Crawler which CAS-PGE supports is file renaming, which this new NamingConvention interface will introduce.  Here is what the NamingConvention interface will look like:
> {code}
> public interface NamingConvention {
>    public File rename(File file, Metadata metadata)
>          throws NamingConventionException;
> }
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (OODT-426) Introduce a CAS-Metadata based renaming interface

Posted by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/OODT-426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13235022#comment-13235022 ] 

jiraposter@reviews.apache.org commented on OODT-426:
----------------------------------------------------


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/4412/#review6185
-----------------------------------------------------------

Ship it!


LGTM

- Chris


On 2012-03-20 08:06:42, brian Foster wrote:
bq.  
bq.  -----------------------------------------------------------
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/4412/
bq.  -----------------------------------------------------------
bq.  
bq.  (Updated 2012-03-20 08:06:42)
bq.  
bq.  
bq.  Review request for oodt, Chris Mattmann, Ricky Nguyen, Paul Ramirez, and Thomas Bennett.
bq.  
bq.  
bq.  Summary
bq.  -------
bq.  
bq.  This is the CAS-Metadata part of this issue
bq.  
bq.  
bq.  This addresses bug OODT-426.
bq.      https://issues.apache.org/jira/browse/OODT-426
bq.  
bq.  
bq.  Diffs
bq.  -----
bq.  
bq.    trunk/metadata/src/main/java/org/apache/oodt/cas/metadata/exceptions/NamingConventionException.java PRE-CREATION 
bq.    trunk/metadata/src/main/java/org/apache/oodt/cas/metadata/filenaming/NamingConvention.java PRE-CREATION 
bq.    trunk/metadata/src/main/java/org/apache/oodt/cas/metadata/filenaming/PathUtilsNamingConvention.java PRE-CREATION 
bq.  
bq.  Diff: https://reviews.apache.org/r/4412/diff
bq.  
bq.  
bq.  Testing
bq.  -------
bq.  
bq.  Still need to add unit-test for PathUtilsNamingConvention
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  brian
bq.  
bq.


                
> Introduce a CAS-Metadata based renaming interface
> -------------------------------------------------
>
>                 Key: OODT-426
>                 URL: https://issues.apache.org/jira/browse/OODT-426
>             Project: OODT
>          Issue Type: Sub-task
>          Components: crawler, metadata container, pge wrapper framework
>    Affects Versions: 0.3
>         Environment: none
>            Reporter: Brian Foster
>            Assignee: Brian Foster
>            Priority: Minor
>             Fix For: 0.4
>
>         Attachments: OODT-426.2012-03-20.cas-metadata.patch.txt
>
>
> The idea here is that CAS-Metadata will introduce a new NamingConvention interface, which will allow for renaming of files.  CAS-Crawler will then be modified to support specified NamingConventions which will be run after all preconditions have passed for a given file.  This will then allow CAS-PGE to then use AutoDetectProductCrawler instead of StdProductCrawler, which will standardize across the board for file extraction (currently CAS-PGE has it's own file extraction interface which uses regular expression to determine files which should be extracted and ingested). The only missing feature in CAS-Crawler which CAS-PGE supports is file renaming, which this new NamingConvention interface will introduce.  Here is what the NamingConvention interface will look like:
> {code}
> public interface NamingConvention {
>    public File rename(File file, Metadata metadata)
>          throws NamingConventionException;
> }
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (OODT-426) Introduce a CAS-Metadata based renaming interface

Posted by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/OODT-426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13239083#comment-13239083 ] 

jiraposter@reviews.apache.org commented on OODT-426:
----------------------------------------------------


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/4444/
-----------------------------------------------------------

(Updated 2012-03-27 00:47:30.189828)


Review request for oodt, Chris Mattmann, Ricky Nguyen, Paul Ramirez, and Thomas Bennett.


Changes
-------

- Updated unit-tests -- ProductCrawler has a unit-test for each possible path that can be taken through handleFile(File) and verifies the appropriate methods where called


Summary
-------

- Introduced NamingConvention support, which for MetExtractorProductCrawler now takes an ID for the NamingConvention to use and AutoDetectProductCrawler has a new element <namingConvention class=""/> in MimeExtractorRepo
- Also cleaned up handleFile(File)... documented better and is now public and returns the IngestResult for what happened when called.


This addresses bug OODT-426.
    https://issues.apache.org/jira/browse/OODT-426


Diffs (updated)
-----

  trunk/crawler/src/main/java/org/apache/oodt/cas/crawl/ProductCrawler.java 1305657 
  trunk/crawler/src/main/java/org/apache/oodt/cas/crawl/MetExtractorProductCrawler.java 1305657 
  trunk/crawler/pom.xml 1305657 
  trunk/crawler/src/main/java/org/apache/oodt/cas/crawl/AutoDetectProductCrawler.java 1305657 
  trunk/crawler/src/main/java/org/apache/oodt/cas/crawl/StdProductCrawler.java 1305657 
  trunk/crawler/src/main/java/org/apache/oodt/cas/crawl/typedetection/MetExtractorSpec.java 1305657 
  trunk/crawler/src/main/java/org/apache/oodt/cas/crawl/typedetection/MimeExtractorConfigMetKeys.java 1305657 
  trunk/crawler/src/main/java/org/apache/oodt/cas/crawl/typedetection/MimeExtractorConfigReader.java 1305657 
  trunk/crawler/src/main/java/org/apache/oodt/cas/crawl/typedetection/MimeExtractorRepo.java 1305657 
  trunk/crawler/src/main/resources/cmd-line-options.xml 1305657 
  trunk/crawler/src/main/resources/crawler-config.xml 1305657 
  trunk/crawler/src/main/resources/examples/mime-extractor-map.xml 1305657 
  trunk/crawler/src/main/resources/naming-beans.xml PRE-CREATION 
  trunk/crawler/src/test/org/apache/oodt/cas/crawl/StateAwareProductCrawler.java PRE-CREATION 
  trunk/crawler/src/test/org/apache/oodt/cas/crawl/TestProductCrawler.java PRE-CREATION 
  trunk/crawler/src/test/org/apache/oodt/cas/crawl/typedetection/TestMimeExtractorConfigReader.java PRE-CREATION 

Diff: https://reviews.apache.org/r/4444/diff


Testing (updated)
-------

Wrote several unit-tests


Thanks,

brian


                
> Introduce a CAS-Metadata based renaming interface
> -------------------------------------------------
>
>                 Key: OODT-426
>                 URL: https://issues.apache.org/jira/browse/OODT-426
>             Project: OODT
>          Issue Type: Sub-task
>          Components: crawler, metadata container, pge wrapper framework
>    Affects Versions: 0.3
>         Environment: none
>            Reporter: Brian Foster
>            Assignee: Brian Foster
>            Priority: Minor
>             Fix For: 0.4
>
>         Attachments: OODT-426.2012-03-20.cas-crawler.patch.txt, OODT-426.2012-03-20.cas-metadata.patch.txt, OODT-426.2012-03-24.cas-crawler.patch.txt
>
>
> The idea here is that CAS-Metadata will introduce a new NamingConvention interface, which will allow for renaming of files.  CAS-Crawler will then be modified to support specified NamingConventions which will be run after all preconditions have passed for a given file.  This will then allow CAS-PGE to then use AutoDetectProductCrawler instead of StdProductCrawler, which will standardize across the board for file extraction (currently CAS-PGE has it's own file extraction interface which uses regular expression to determine files which should be extracted and ingested). The only missing feature in CAS-Crawler which CAS-PGE supports is file renaming, which this new NamingConvention interface will introduce.  Here is what the NamingConvention interface will look like:
> {code}
> public interface NamingConvention {
>    public File rename(File file, Metadata metadata)
>          throws NamingConventionException;
> }
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (OODT-426) Introduce a CAS-Metadata based renaming interface

Posted by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/OODT-426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13248127#comment-13248127 ] 

jiraposter@reviews.apache.org commented on OODT-426:
----------------------------------------------------


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/4628/#review6735
-----------------------------------------------------------

Ship it!


LGTM sounds good.

- Chris


On 2012-04-06 02:16:10, brian Foster wrote:
bq.  
bq.  -----------------------------------------------------------
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/4628/
bq.  -----------------------------------------------------------
bq.  
bq.  (Updated 2012-04-06 02:16:10)
bq.  
bq.  
bq.  Review request for oodt, Chris Mattmann, Ricky Nguyen, Paul Ramirez, and Thomas Bennett.
bq.  
bq.  
bq.  Summary
bq.  -------
bq.  
bq.  CAS-PGE Changes to this issue...
bq.  - Renaming and Metadata extraction removed from CAS-PGE and instead CAS-PGE now uses AutoDetectProductCrawler instead of StdProductCrawler
bq.  
bq.  
bq.  This addresses bug OODT-426.
bq.      https://issues.apache.org/jira/browse/OODT-426
bq.  
bq.  
bq.  Diffs
bq.  -----
bq.  
bq.    trunk/pge/src/main/resources/examples/Crawler/naming-beans.xml PRE-CREATION 
bq.    trunk/pge/src/main/resources/examples/Crawler/precondition-beans.xml PRE-CREATION 
bq.    trunk/pge/src/main/resources/examples/MetadataOutputFiles/metadata-output.xml 1302648 
bq.    trunk/pge/src/main/resources/examples/PgeConfigFiles/pge-config.xml 1302648 
bq.    trunk/pge/src/test/org/apache/oodt/cas/pge/TestPGETaskInstance.java 1302781 
bq.    trunk/pge/src/main/resources/examples/Crawler/mime-types.xml PRE-CREATION 
bq.    trunk/pge/src/main/resources/examples/Crawler/mime-extractor-map.xml PRE-CREATION 
bq.    trunk/pge/src/main/resources/examples/Crawler/crawler-config.xml PRE-CREATION 
bq.    trunk/pge/src/main/resources/examples/Crawler/filename.extractor.config.xml PRE-CREATION 
bq.    trunk/pge/src/main/resources/examples/Crawler/action-beans.xml PRE-CREATION 
bq.    trunk/pge/src/main/resources/examples/Crawler/action-beans.properties PRE-CREATION 
bq.    trunk/pge/src/main/java/org/apache/oodt/cas/pge/writers/metlist/MetadataListPcsMetFileWriter.java 1302648 
bq.    trunk/pge/src/main/java/org/apache/oodt/cas/pge/writers/xslt/XslTransformWriter.java 1302648 
bq.    trunk/pge/src/main/java/org/apache/oodt/cas/pge/writers/SciPgeConfigFileWriter.java 1302648 
bq.    trunk/pge/src/main/java/org/apache/oodt/cas/pge/config/XmlFilePgeConfigBuilder.java 1302648 
bq.    trunk/pge/src/main/java/org/apache/oodt/cas/pge/metadata/PgeTaskMetKeys.java 1302648 
bq.    trunk/pge/src/main/java/org/apache/oodt/cas/pge/writers/ExternExtractorMetWriter.java 1302648 
bq.    trunk/pge/src/main/java/org/apache/oodt/cas/pge/writers/FilenameExtractorWriter.java 1302648 
bq.    trunk/pge/src/main/java/org/apache/oodt/cas/pge/writers/PcsMetFileWriter.java 1302648 
bq.    trunk/pge/src/main/java/org/apache/oodt/cas/pge/config/RenamingConv.java 1302648 
bq.    trunk/pge/src/main/java/org/apache/oodt/cas/pge/config/PgeConfigBuilder.java 1302648 
bq.    trunk/pge/src/main/java/org/apache/oodt/cas/pge/config/PgeConfigMetKeys.java 1302648 
bq.    trunk/pge/src/main/java/org/apache/oodt/cas/pge/config/RegExprOutputFiles.java 1302648 
bq.    trunk/pge/src/main/java/org/apache/oodt/cas/pge/config/OutputDir.java 1302648 
bq.    trunk/pge/src/main/java/org/apache/oodt/cas/pge/config/PgeConfig.java 1302648 
bq.    trunk/pge/pom.xml 1302648 
bq.    trunk/pge/src/main/java/org/apache/oodt/cas/pge/PGETaskInstance.java 1302648 
bq.  
bq.  Diff: https://reviews.apache.org/r/4628/diff
bq.  
bq.  
bq.  Testing
bq.  -------
bq.  
bq.  Several Unit-tests
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  brian
bq.  
bq.


                
> Introduce a CAS-Metadata based renaming interface
> -------------------------------------------------
>
>                 Key: OODT-426
>                 URL: https://issues.apache.org/jira/browse/OODT-426
>             Project: OODT
>          Issue Type: Sub-task
>          Components: crawler, metadata container, pge wrapper framework
>    Affects Versions: 0.3
>         Environment: none
>            Reporter: Brian Foster
>            Assignee: Brian Foster
>            Priority: Minor
>             Fix For: 0.5
>
>         Attachments: OODT-426.2012-03-20.cas-crawler.patch.txt, OODT-426.2012-03-20.cas-metadata.patch.txt, OODT-426.2012-03-24.cas-crawler.patch.txt, OODT-426.2012-04-03.cas-pge.txt
>
>
> The idea here is that CAS-Metadata will introduce a new NamingConvention interface, which will allow for renaming of files.  CAS-Crawler will then be modified to support specified NamingConventions which will be run after all preconditions have passed for a given file.  This will then allow CAS-PGE to then use AutoDetectProductCrawler instead of StdProductCrawler, which will standardize across the board for file extraction (currently CAS-PGE has it's own file extraction interface which uses regular expression to determine files which should be extracted and ingested). The only missing feature in CAS-Crawler which CAS-PGE supports is file renaming, which this new NamingConvention interface will introduce.  Here is what the NamingConvention interface will look like:
> {code}
> public interface NamingConvention {
>    public File rename(File file, Metadata metadata)
>          throws NamingConventionException;
> }
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (OODT-426) Introduce a CAS-Metadata based renaming interface

Posted by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/OODT-426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13237761#comment-13237761 ] 

jiraposter@reviews.apache.org commented on OODT-426:
----------------------------------------------------


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/4444/
-----------------------------------------------------------

(Updated 2012-03-25 01:55:32.563950)


Review request for oodt, Chris Mattmann, Ricky Nguyen, Paul Ramirez, and Thomas Bennett.


Changes
-------

with some unit-tests... few more to go


Summary
-------

- Introduced NamingConvention support, which for MetExtractorProductCrawler now takes an ID for the NamingConvention to use and AutoDetectProductCrawler has a new element <namingConvention class=""/> in MimeExtractorRepo
- Also cleaned up handleFile(File)... documented better and is now public and returns the IngestResult for what happened when called.


This addresses bug OODT-426.
    https://issues.apache.org/jira/browse/OODT-426


Diffs (updated)
-----

  trunk/crawler/src/main/resources/naming-beans.xml PRE-CREATION 
  trunk/crawler/src/test/org/apache/oodt/cas/crawl/TestProductCrawler.java PRE-CREATION 
  trunk/crawler/src/main/resources/cmd-line-options.xml 1302790 
  trunk/crawler/src/main/resources/crawler-config.xml 1302790 
  trunk/crawler/src/main/resources/examples/mime-extractor-map.xml 1302790 
  trunk/crawler/src/main/java/org/apache/oodt/cas/crawl/typedetection/MimeExtractorRepo.java 1302790 
  trunk/crawler/src/main/java/org/apache/oodt/cas/crawl/StdProductCrawler.java 1302790 
  trunk/crawler/src/main/java/org/apache/oodt/cas/crawl/typedetection/MimeExtractorConfigMetKeys.java 1302790 
  trunk/crawler/src/main/java/org/apache/oodt/cas/crawl/typedetection/MimeExtractorConfigReader.java 1302790 
  trunk/crawler/src/main/java/org/apache/oodt/cas/crawl/ProductCrawler.java 1302790 
  trunk/crawler/pom.xml 1302790 
  trunk/crawler/src/main/java/org/apache/oodt/cas/crawl/AutoDetectProductCrawler.java 1302790 
  trunk/crawler/src/main/java/org/apache/oodt/cas/crawl/MetExtractorProductCrawler.java 1302790 

Diff: https://reviews.apache.org/r/4444/diff


Testing
-------

Still need to unit-test up cas-crawler


Thanks,

brian


                
> Introduce a CAS-Metadata based renaming interface
> -------------------------------------------------
>
>                 Key: OODT-426
>                 URL: https://issues.apache.org/jira/browse/OODT-426
>             Project: OODT
>          Issue Type: Sub-task
>          Components: crawler, metadata container, pge wrapper framework
>    Affects Versions: 0.3
>         Environment: none
>            Reporter: Brian Foster
>            Assignee: Brian Foster
>            Priority: Minor
>             Fix For: 0.4
>
>         Attachments: OODT-426.2012-03-20.cas-crawler.patch.txt, OODT-426.2012-03-20.cas-metadata.patch.txt, OODT-426.2012-03-24.cas-crawler.patch.txt
>
>
> The idea here is that CAS-Metadata will introduce a new NamingConvention interface, which will allow for renaming of files.  CAS-Crawler will then be modified to support specified NamingConventions which will be run after all preconditions have passed for a given file.  This will then allow CAS-PGE to then use AutoDetectProductCrawler instead of StdProductCrawler, which will standardize across the board for file extraction (currently CAS-PGE has it's own file extraction interface which uses regular expression to determine files which should be extracted and ingested). The only missing feature in CAS-Crawler which CAS-PGE supports is file renaming, which this new NamingConvention interface will introduce.  Here is what the NamingConvention interface will look like:
> {code}
> public interface NamingConvention {
>    public File rename(File file, Metadata metadata)
>          throws NamingConventionException;
> }
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (OODT-426) Introduce a CAS-Metadata based renaming interface

Posted by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/OODT-426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13246566#comment-13246566 ] 

jiraposter@reviews.apache.org commented on OODT-426:
----------------------------------------------------


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/4628/#review6694
-----------------------------------------------------------

Ship it!


LGTM, minor comments on my end. Great work. This will cause some user headache, but it's worth it and 0.4 is a game changing release.

- Chris


On 2012-04-03 21:56:17, brian Foster wrote:
bq.  
bq.  -----------------------------------------------------------
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/4628/
bq.  -----------------------------------------------------------
bq.  
bq.  (Updated 2012-04-03 21:56:17)
bq.  
bq.  
bq.  Review request for oodt, Chris Mattmann, Ricky Nguyen, Paul Ramirez, and Thomas Bennett.
bq.  
bq.  
bq.  Summary
bq.  -------
bq.  
bq.  CAS-PGE Changes to this issue...
bq.  - Renaming and Metadata extraction removed from CAS-PGE and instead CAS-PGE now uses AutoDetectProductCrawler instead of StdProductCrawler
bq.  
bq.  
bq.  This addresses bug OODT-426.
bq.      https://issues.apache.org/jira/browse/OODT-426
bq.  
bq.  
bq.  Diffs
bq.  -----
bq.  
bq.    trunk/pge/pom.xml 1302648 
bq.    trunk/pge/src/main/java/org/apache/oodt/cas/pge/PGETaskInstance.java 1302648 
bq.    trunk/pge/src/main/java/org/apache/oodt/cas/pge/config/OutputDir.java 1302648 
bq.    trunk/pge/src/main/java/org/apache/oodt/cas/pge/config/PgeConfig.java 1302648 
bq.    trunk/pge/src/main/java/org/apache/oodt/cas/pge/config/PgeConfigBuilder.java 1302648 
bq.    trunk/pge/src/main/java/org/apache/oodt/cas/pge/config/PgeConfigMetKeys.java 1302648 
bq.    trunk/pge/src/main/java/org/apache/oodt/cas/pge/config/RegExprOutputFiles.java 1302648 
bq.    trunk/pge/src/main/java/org/apache/oodt/cas/pge/config/RenamingConv.java 1302648 
bq.    trunk/pge/src/main/java/org/apache/oodt/cas/pge/config/XmlFilePgeConfigBuilder.java 1302648 
bq.    trunk/pge/src/main/java/org/apache/oodt/cas/pge/metadata/PgeTaskMetKeys.java 1302648 
bq.    trunk/pge/src/main/java/org/apache/oodt/cas/pge/writers/ExternExtractorMetWriter.java 1302648 
bq.    trunk/pge/src/main/java/org/apache/oodt/cas/pge/writers/FilenameExtractorWriter.java 1302648 
bq.    trunk/pge/src/main/java/org/apache/oodt/cas/pge/writers/PcsMetFileWriter.java 1302648 
bq.    trunk/pge/src/main/java/org/apache/oodt/cas/pge/writers/SciPgeConfigFileWriter.java 1302648 
bq.    trunk/pge/src/main/java/org/apache/oodt/cas/pge/writers/metlist/MetadataListPcsMetFileWriter.java 1302648 
bq.    trunk/pge/src/main/java/org/apache/oodt/cas/pge/writers/xslt/XslTransformWriter.java 1302648 
bq.    trunk/pge/src/main/resources/examples/Crawler/action-beans.xml PRE-CREATION 
bq.    trunk/pge/src/main/resources/examples/Crawler/crawler-config.xml PRE-CREATION 
bq.    trunk/pge/src/main/resources/examples/Crawler/mime-extractor-map.xml PRE-CREATION 
bq.    trunk/pge/src/main/resources/examples/Crawler/mime-types.xml PRE-CREATION 
bq.    trunk/pge/src/main/resources/examples/Crawler/naming-beans.xml PRE-CREATION 
bq.    trunk/pge/src/main/resources/examples/Crawler/precondition-beans.xml PRE-CREATION 
bq.    trunk/pge/src/main/resources/examples/MetadataOutputFiles/metadata-output.xml 1302648 
bq.    trunk/pge/src/main/resources/examples/PgeConfigFiles/pge-config.xml 1302648 
bq.    trunk/pge/src/test/org/apache/oodt/cas/pge/TestPGETaskInstance.java 1302781 
bq.  
bq.  Diff: https://reviews.apache.org/r/4628/diff
bq.  
bq.  
bq.  Testing
bq.  -------
bq.  
bq.  Several Unit-tests
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  brian
bq.  
bq.


                
> Introduce a CAS-Metadata based renaming interface
> -------------------------------------------------
>
>                 Key: OODT-426
>                 URL: https://issues.apache.org/jira/browse/OODT-426
>             Project: OODT
>          Issue Type: Sub-task
>          Components: crawler, metadata container, pge wrapper framework
>    Affects Versions: 0.3
>         Environment: none
>            Reporter: Brian Foster
>            Assignee: Brian Foster
>            Priority: Minor
>             Fix For: 0.5
>
>         Attachments: OODT-426.2012-03-20.cas-crawler.patch.txt, OODT-426.2012-03-20.cas-metadata.patch.txt, OODT-426.2012-03-24.cas-crawler.patch.txt, OODT-426.2012-04-03.cas-pge.txt
>
>
> The idea here is that CAS-Metadata will introduce a new NamingConvention interface, which will allow for renaming of files.  CAS-Crawler will then be modified to support specified NamingConventions which will be run after all preconditions have passed for a given file.  This will then allow CAS-PGE to then use AutoDetectProductCrawler instead of StdProductCrawler, which will standardize across the board for file extraction (currently CAS-PGE has it's own file extraction interface which uses regular expression to determine files which should be extracted and ingested). The only missing feature in CAS-Crawler which CAS-PGE supports is file renaming, which this new NamingConvention interface will introduce.  Here is what the NamingConvention interface will look like:
> {code}
> public interface NamingConvention {
>    public File rename(File file, Metadata metadata)
>          throws NamingConventionException;
> }
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (OODT-426) Introduce a CAS-Metadata based renaming interface

Posted by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/OODT-426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13235416#comment-13235416 ] 

jiraposter@reviews.apache.org commented on OODT-426:
----------------------------------------------------


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/4444/#review6215
-----------------------------------------------------------


looking good happy to review unit tests when they are there. LGTM dude you rule!

- Chris


On 2012-03-22 06:09:52, brian Foster wrote:
bq.  
bq.  -----------------------------------------------------------
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/4444/
bq.  -----------------------------------------------------------
bq.  
bq.  (Updated 2012-03-22 06:09:52)
bq.  
bq.  
bq.  Review request for oodt, Chris Mattmann, Ricky Nguyen, Paul Ramirez, and Thomas Bennett.
bq.  
bq.  
bq.  Summary
bq.  -------
bq.  
bq.  - Introduced NamingConvention support, which for MetExtractorProductCrawler now takes an ID for the NamingConvention to use and AutoDetectProductCrawler has a new element <namingConvention class=""/> in MimeExtractorRepo
bq.  - Also cleaned up handleFile(File)... documented better and is now public and returns the IngestResult for what happened when called.
bq.  
bq.  
bq.  This addresses bug OODT-426.
bq.      https://issues.apache.org/jira/browse/OODT-426
bq.  
bq.  
bq.  Diffs
bq.  -----
bq.  
bq.    trunk/crawler/src/main/java/org/apache/oodt/cas/crawl/AutoDetectProductCrawler.java 1302790 
bq.    trunk/crawler/src/main/java/org/apache/oodt/cas/crawl/MetExtractorProductCrawler.java 1302790 
bq.    trunk/crawler/src/main/java/org/apache/oodt/cas/crawl/ProductCrawler.java 1302790 
bq.    trunk/crawler/src/main/java/org/apache/oodt/cas/crawl/StdProductCrawler.java 1302790 
bq.    trunk/crawler/src/main/java/org/apache/oodt/cas/crawl/typedetection/MimeExtractorConfigMetKeys.java 1302790 
bq.    trunk/crawler/src/main/java/org/apache/oodt/cas/crawl/typedetection/MimeExtractorConfigReader.java 1302790 
bq.    trunk/crawler/src/main/java/org/apache/oodt/cas/crawl/typedetection/MimeExtractorRepo.java 1302790 
bq.    trunk/crawler/src/main/resources/cmd-line-options.xml 1302790 
bq.    trunk/crawler/src/main/resources/crawler-config.xml 1302790 
bq.    trunk/crawler/src/main/resources/examples/mime-extractor-map.xml 1302790 
bq.    trunk/crawler/src/main/resources/naming-beans.xml PRE-CREATION 
bq.  
bq.  Diff: https://reviews.apache.org/r/4444/diff
bq.  
bq.  
bq.  Testing
bq.  -------
bq.  
bq.  Still need to unit-test up cas-crawler
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  brian
bq.  
bq.


                
> Introduce a CAS-Metadata based renaming interface
> -------------------------------------------------
>
>                 Key: OODT-426
>                 URL: https://issues.apache.org/jira/browse/OODT-426
>             Project: OODT
>          Issue Type: Sub-task
>          Components: crawler, metadata container, pge wrapper framework
>    Affects Versions: 0.3
>         Environment: none
>            Reporter: Brian Foster
>            Assignee: Brian Foster
>            Priority: Minor
>             Fix For: 0.4
>
>         Attachments: OODT-426.2012-03-20.cas-crawler.patch.txt, OODT-426.2012-03-20.cas-metadata.patch.txt
>
>
> The idea here is that CAS-Metadata will introduce a new NamingConvention interface, which will allow for renaming of files.  CAS-Crawler will then be modified to support specified NamingConventions which will be run after all preconditions have passed for a given file.  This will then allow CAS-PGE to then use AutoDetectProductCrawler instead of StdProductCrawler, which will standardize across the board for file extraction (currently CAS-PGE has it's own file extraction interface which uses regular expression to determine files which should be extracted and ingested). The only missing feature in CAS-Crawler which CAS-PGE supports is file renaming, which this new NamingConvention interface will introduce.  Here is what the NamingConvention interface will look like:
> {code}
> public interface NamingConvention {
>    public File rename(File file, Metadata metadata)
>          throws NamingConventionException;
> }
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (OODT-426) Introduce a CAS-Metadata based renaming interface

Posted by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/OODT-426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13236432#comment-13236432 ] 

jiraposter@reviews.apache.org commented on OODT-426:
----------------------------------------------------



bq.  On 2012-03-22 07:15:57, Chris Mattmann wrote:
bq.  > trunk/crawler/src/main/java/org/apache/oodt/cas/crawl/typedetection/MimeExtractorConfigReader.java, line 90
bq.  > <https://reviews.apache.org/r/4444/diff/1/?file=94495#file94495line90>
bq.  >
bq.  >     does this make it so that Exception is never thrown and that this the default?

This just allows you to specify a default... See other comment about exception


- brian


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/4444/#review6214
-----------------------------------------------------------


On 2012-03-22 06:09:52, brian Foster wrote:
bq.  
bq.  -----------------------------------------------------------
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/4444/
bq.  -----------------------------------------------------------
bq.  
bq.  (Updated 2012-03-22 06:09:52)
bq.  
bq.  
bq.  Review request for oodt, Chris Mattmann, Ricky Nguyen, Paul Ramirez, and Thomas Bennett.
bq.  
bq.  
bq.  Summary
bq.  -------
bq.  
bq.  - Introduced NamingConvention support, which for MetExtractorProductCrawler now takes an ID for the NamingConvention to use and AutoDetectProductCrawler has a new element <namingConvention class=""/> in MimeExtractorRepo
bq.  - Also cleaned up handleFile(File)... documented better and is now public and returns the IngestResult for what happened when called.
bq.  
bq.  
bq.  This addresses bug OODT-426.
bq.      https://issues.apache.org/jira/browse/OODT-426
bq.  
bq.  
bq.  Diffs
bq.  -----
bq.  
bq.    trunk/crawler/src/main/java/org/apache/oodt/cas/crawl/AutoDetectProductCrawler.java 1302790 
bq.    trunk/crawler/src/main/java/org/apache/oodt/cas/crawl/MetExtractorProductCrawler.java 1302790 
bq.    trunk/crawler/src/main/java/org/apache/oodt/cas/crawl/ProductCrawler.java 1302790 
bq.    trunk/crawler/src/main/java/org/apache/oodt/cas/crawl/StdProductCrawler.java 1302790 
bq.    trunk/crawler/src/main/java/org/apache/oodt/cas/crawl/typedetection/MimeExtractorConfigMetKeys.java 1302790 
bq.    trunk/crawler/src/main/java/org/apache/oodt/cas/crawl/typedetection/MimeExtractorConfigReader.java 1302790 
bq.    trunk/crawler/src/main/java/org/apache/oodt/cas/crawl/typedetection/MimeExtractorRepo.java 1302790 
bq.    trunk/crawler/src/main/resources/cmd-line-options.xml 1302790 
bq.    trunk/crawler/src/main/resources/crawler-config.xml 1302790 
bq.    trunk/crawler/src/main/resources/examples/mime-extractor-map.xml 1302790 
bq.    trunk/crawler/src/main/resources/naming-beans.xml PRE-CREATION 
bq.  
bq.  Diff: https://reviews.apache.org/r/4444/diff
bq.  
bq.  
bq.  Testing
bq.  -------
bq.  
bq.  Still need to unit-test up cas-crawler
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  brian
bq.  
bq.


                
> Introduce a CAS-Metadata based renaming interface
> -------------------------------------------------
>
>                 Key: OODT-426
>                 URL: https://issues.apache.org/jira/browse/OODT-426
>             Project: OODT
>          Issue Type: Sub-task
>          Components: crawler, metadata container, pge wrapper framework
>    Affects Versions: 0.3
>         Environment: none
>            Reporter: Brian Foster
>            Assignee: Brian Foster
>            Priority: Minor
>             Fix For: 0.4
>
>         Attachments: OODT-426.2012-03-20.cas-crawler.patch.txt, OODT-426.2012-03-20.cas-metadata.patch.txt
>
>
> The idea here is that CAS-Metadata will introduce a new NamingConvention interface, which will allow for renaming of files.  CAS-Crawler will then be modified to support specified NamingConventions which will be run after all preconditions have passed for a given file.  This will then allow CAS-PGE to then use AutoDetectProductCrawler instead of StdProductCrawler, which will standardize across the board for file extraction (currently CAS-PGE has it's own file extraction interface which uses regular expression to determine files which should be extracted and ingested). The only missing feature in CAS-Crawler which CAS-PGE supports is file renaming, which this new NamingConvention interface will introduce.  Here is what the NamingConvention interface will look like:
> {code}
> public interface NamingConvention {
>    public File rename(File file, Metadata metadata)
>          throws NamingConventionException;
> }
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (OODT-426) Introduce a CAS-Metadata based renaming interface

Posted by "Brian Foster (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/OODT-426?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Brian Foster updated OODT-426:
------------------------------

    Attachment: OODT-426.2012-03-20.cas-metadata.patch.txt

- attached cas-metadata part of patch
                
> Introduce a CAS-Metadata based renaming interface
> -------------------------------------------------
>
>                 Key: OODT-426
>                 URL: https://issues.apache.org/jira/browse/OODT-426
>             Project: OODT
>          Issue Type: Sub-task
>          Components: crawler, metadata container, pge wrapper framework
>    Affects Versions: 0.3
>         Environment: none
>            Reporter: Brian Foster
>            Assignee: Brian Foster
>            Priority: Minor
>             Fix For: 0.4
>
>         Attachments: OODT-426.2012-03-20.cas-metadata.patch.txt
>
>
> The idea here is that CAS-Metadata will introduce a new NamingConvention interface, which will allow for renaming of files.  CAS-Crawler will then be modified to support specified NamingConventions which will be run after all preconditions have passed for a given file.  This will then allow CAS-PGE to then use AutoDetectProductCrawler instead of StdProductCrawler, which will standardize across the board for file extraction (currently CAS-PGE has it's own file extraction interface which uses regular expression to determine files which should be extracted and ingested). The only missing feature in CAS-Crawler which CAS-PGE supports is file renaming, which this new NamingConvention interface will introduce.  Here is what the NamingConvention interface will look like:
> {code}
> public interface NamingConvention {
>    public File rename(File file, Metadata metadata)
>          throws NamingConventionException;
> }
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (OODT-426) Introduce a CAS-Metadata based renaming interface

Posted by "Brian Foster (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/OODT-426?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Brian Foster updated OODT-426:
------------------------------

    Attachment: OODT-426.2012-03-20.cas-crawler.patch.txt

- attached cas-crawler part of patch
                
> Introduce a CAS-Metadata based renaming interface
> -------------------------------------------------
>
>                 Key: OODT-426
>                 URL: https://issues.apache.org/jira/browse/OODT-426
>             Project: OODT
>          Issue Type: Sub-task
>          Components: crawler, metadata container, pge wrapper framework
>    Affects Versions: 0.3
>         Environment: none
>            Reporter: Brian Foster
>            Assignee: Brian Foster
>            Priority: Minor
>             Fix For: 0.4
>
>         Attachments: OODT-426.2012-03-20.cas-crawler.patch.txt, OODT-426.2012-03-20.cas-metadata.patch.txt
>
>
> The idea here is that CAS-Metadata will introduce a new NamingConvention interface, which will allow for renaming of files.  CAS-Crawler will then be modified to support specified NamingConventions which will be run after all preconditions have passed for a given file.  This will then allow CAS-PGE to then use AutoDetectProductCrawler instead of StdProductCrawler, which will standardize across the board for file extraction (currently CAS-PGE has it's own file extraction interface which uses regular expression to determine files which should be extracted and ingested). The only missing feature in CAS-Crawler which CAS-PGE supports is file renaming, which this new NamingConvention interface will introduce.  Here is what the NamingConvention interface will look like:
> {code}
> public interface NamingConvention {
>    public File rename(File file, Metadata metadata)
>          throws NamingConventionException;
> }
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (OODT-426) Introduce a CAS-Metadata based renaming interface

Posted by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/OODT-426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13238502#comment-13238502 ] 

jiraposter@reviews.apache.org commented on OODT-426:
----------------------------------------------------


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/4444/#review6349
-----------------------------------------------------------

Ship it!


My comments are pretty minor, but check em' out. LGTM.


trunk/crawler/src/main/java/org/apache/oodt/cas/crawl/ProductCrawler.java
<https://reviews.apache.org/r/4444/#comment13774>

    Are all of these @VisibleForTesting coupling our test system too much with the code? Just wondering...



trunk/crawler/src/main/java/org/apache/oodt/cas/crawl/StdProductCrawler.java
<https://reviews.apache.org/r/4444/#comment13775>

    Should we augment the ProductCrawler super class to declare this function as an abstract method since all sub class crawlers implement it?


- Chris


On 2012-03-25 01:55:32, brian Foster wrote:
bq.  
bq.  -----------------------------------------------------------
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/4444/
bq.  -----------------------------------------------------------
bq.  
bq.  (Updated 2012-03-25 01:55:32)
bq.  
bq.  
bq.  Review request for oodt, Chris Mattmann, Ricky Nguyen, Paul Ramirez, and Thomas Bennett.
bq.  
bq.  
bq.  Summary
bq.  -------
bq.  
bq.  - Introduced NamingConvention support, which for MetExtractorProductCrawler now takes an ID for the NamingConvention to use and AutoDetectProductCrawler has a new element <namingConvention class=""/> in MimeExtractorRepo
bq.  - Also cleaned up handleFile(File)... documented better and is now public and returns the IngestResult for what happened when called.
bq.  
bq.  
bq.  This addresses bug OODT-426.
bq.      https://issues.apache.org/jira/browse/OODT-426
bq.  
bq.  
bq.  Diffs
bq.  -----
bq.  
bq.    trunk/crawler/src/main/resources/naming-beans.xml PRE-CREATION 
bq.    trunk/crawler/src/test/org/apache/oodt/cas/crawl/TestProductCrawler.java PRE-CREATION 
bq.    trunk/crawler/src/main/resources/cmd-line-options.xml 1302790 
bq.    trunk/crawler/src/main/resources/crawler-config.xml 1302790 
bq.    trunk/crawler/src/main/resources/examples/mime-extractor-map.xml 1302790 
bq.    trunk/crawler/src/main/java/org/apache/oodt/cas/crawl/typedetection/MimeExtractorRepo.java 1302790 
bq.    trunk/crawler/src/main/java/org/apache/oodt/cas/crawl/StdProductCrawler.java 1302790 
bq.    trunk/crawler/src/main/java/org/apache/oodt/cas/crawl/typedetection/MimeExtractorConfigMetKeys.java 1302790 
bq.    trunk/crawler/src/main/java/org/apache/oodt/cas/crawl/typedetection/MimeExtractorConfigReader.java 1302790 
bq.    trunk/crawler/src/main/java/org/apache/oodt/cas/crawl/ProductCrawler.java 1302790 
bq.    trunk/crawler/pom.xml 1302790 
bq.    trunk/crawler/src/main/java/org/apache/oodt/cas/crawl/AutoDetectProductCrawler.java 1302790 
bq.    trunk/crawler/src/main/java/org/apache/oodt/cas/crawl/MetExtractorProductCrawler.java 1302790 
bq.  
bq.  Diff: https://reviews.apache.org/r/4444/diff
bq.  
bq.  
bq.  Testing
bq.  -------
bq.  
bq.  Still need to unit-test up cas-crawler
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  brian
bq.  
bq.


                
> Introduce a CAS-Metadata based renaming interface
> -------------------------------------------------
>
>                 Key: OODT-426
>                 URL: https://issues.apache.org/jira/browse/OODT-426
>             Project: OODT
>          Issue Type: Sub-task
>          Components: crawler, metadata container, pge wrapper framework
>    Affects Versions: 0.3
>         Environment: none
>            Reporter: Brian Foster
>            Assignee: Brian Foster
>            Priority: Minor
>             Fix For: 0.4
>
>         Attachments: OODT-426.2012-03-20.cas-crawler.patch.txt, OODT-426.2012-03-20.cas-metadata.patch.txt, OODT-426.2012-03-24.cas-crawler.patch.txt
>
>
> The idea here is that CAS-Metadata will introduce a new NamingConvention interface, which will allow for renaming of files.  CAS-Crawler will then be modified to support specified NamingConventions which will be run after all preconditions have passed for a given file.  This will then allow CAS-PGE to then use AutoDetectProductCrawler instead of StdProductCrawler, which will standardize across the board for file extraction (currently CAS-PGE has it's own file extraction interface which uses regular expression to determine files which should be extracted and ingested). The only missing feature in CAS-Crawler which CAS-PGE supports is file renaming, which this new NamingConvention interface will introduce.  Here is what the NamingConvention interface will look like:
> {code}
> public interface NamingConvention {
>    public File rename(File file, Metadata metadata)
>          throws NamingConventionException;
> }
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira