You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@manifoldcf.apache.org by "Jack Krupansky (JIRA)" <ji...@apache.org> on 2010/10/14 01:36:32 UTC
[jira] Issue Comment Edited: (CONNECTORS-118) Crawled archive files
should be expanded into their constituent files
[ https://issues.apache.org/jira/browse/CONNECTORS-118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12920801#action_12920801 ]
Jack Krupansky edited comment on CONNECTORS-118 at 10/13/10 7:35 PM:
---------------------------------------------------------------------
I have personally written unit tests that generated most of those formats which Aperture then extracted.
See:
http://sourceforge.net/apps/trac/aperture/wiki/SubCrawlers
org.apache.tools.bzip2 - BZIP2 archives.
java.util.zip.GZIPInputStream - GZIP archives.
javax.mail - message/rfc822-style messages and mbox files.
org.apache.tools.tar - tar archives.
was (Author: jkrupan):
One of those VFS links points to all the Java packages used to access the list of archive formats I listed. I have personally written unit tests that generated most of those formats which Aperture then extracted.
> Crawled archive files should be expanded into their constituent files
> ---------------------------------------------------------------------
>
> Key: CONNECTORS-118
> URL: https://issues.apache.org/jira/browse/CONNECTORS-118
> Project: ManifoldCF
> Issue Type: New Feature
> Components: Framework crawler agent
> Reporter: Jack Krupansky
>
> Archive files such as zip, mbox, tar, etc. should be expanded into their constituent files during crawling of repositories so that any output connector would output the flattened archive.
> This could be an option, defaulted to ON, since someone may want to implement a "copy" connector that maintains crawled files as-is.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.