You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@manifoldcf.apache.org by "Jack Krupansky (JIRA)" <ji...@apache.org> on 2010/10/14 01:36:32 UTC

[jira] Issue Comment Edited: (CONNECTORS-118) Crawled archive files should be expanded into their constituent files

    [ https://issues.apache.org/jira/browse/CONNECTORS-118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12920801#action_12920801 ] 

Jack Krupansky edited comment on CONNECTORS-118 at 10/13/10 7:35 PM:
---------------------------------------------------------------------

I have personally written unit tests that generated most of those formats which Aperture then extracted.

See:
http://sourceforge.net/apps/trac/aperture/wiki/SubCrawlers

org.apache.tools.bzip2 - BZIP2 archives.
java.util.zip.GZIPInputStream - GZIP archives.
javax.mail	 - message/rfc822-style messages and mbox files.
org.apache.tools.tar	 - tar archives.



      was (Author: jkrupan):
    One of those VFS links points to all the Java packages used to access the list of archive formats I listed. I have personally written unit tests that generated most of those formats which Aperture then extracted.

  
> Crawled archive files should be expanded into their constituent files
> ---------------------------------------------------------------------
>
>                 Key: CONNECTORS-118
>                 URL: https://issues.apache.org/jira/browse/CONNECTORS-118
>             Project: ManifoldCF
>          Issue Type: New Feature
>          Components: Framework crawler agent
>            Reporter: Jack Krupansky
>
> Archive files such as zip, mbox, tar, etc. should be expanded into their constituent files during crawling of repositories so that any output connector would output the flattened archive.
> This could be an option, defaulted to ON, since someone may want to implement a "copy" connector that maintains crawled files as-is.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.