You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Jukka Zitting (JIRA)" <ji...@apache.org> on 2009/05/22 23:59:45 UTC

[jira] Commented: (TIKA-232) Scanning of archive files

    [ https://issues.apache.org/jira/browse/TIKA-232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12712283#action_12712283 ] 

Jukka Zitting commented on TIKA-232:
------------------------------------

If you're instantiating the package parsers directly, then you can achieve this simply by overriding the parser that is used for the files inside a package:

    PackageParser parser = ...;
    parser.setParser(new EmptyParser());

You could also use the following hack to do this for a pre-configured composite parser like the AutoDetectParser:

    CompositeParser composite = new AutoDetectParser();
    for (Parser parser : composite.getParsers().values()) {
        if (Parser instanceof PackageParser) {
            ((PackageParser) parser).setParser(new EmptyParser());
        }
    }

Perhaps someone has a good idea how to make this easier?

> Scanning of archive files
> -------------------------
>
>                 Key: TIKA-232
>                 URL: https://issues.apache.org/jira/browse/TIKA-232
>             Project: Tika
>          Issue Type: New Feature
>          Components: parser
>    Affects Versions: 0.3
>         Environment: All
>            Reporter: Karl Heinz Marbaise
>            Priority: Minor
>
> If i parse an archive all the files inside the archive will be extracted with their text as well. It would be nice to have the choice to extract only the list of files (directory) of an archive instead of extracting the whole contents. This seemed to be usable only for zip, tar, tar.gz, tar.bz2, .jar. May be this could be realized by using a different calling or by a run time configuration.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.