You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@tika.apache.org by "Tim Allison (JIRA)" <ji...@apache.org> on 2013/12/19 15:46:11 UTC

[jira] [Commented] (TIKA-1212) Recursive Extraction of Archive File

    [ https://issues.apache.org/jira/browse/TIKA-1212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13852938#comment-13852938 ] 

Tim Allison commented on TIKA-1212:
-----------------------------------

On first issue: do you mean that you'd like to have a parameter that would unzip the abc.zip file but not unzip the pqr.zip file?  Or do you want to be able to select embedded document types that you don't want to recurse through?


> Recursive Extraction of Archive File
> ------------------------------------
>
>                 Key: TIKA-1212
>                 URL: https://issues.apache.org/jira/browse/TIKA-1212
>             Project: Tika
>          Issue Type: Bug
>            Reporter: Vikram
>            Priority: Critical
>
> Please refer the code: http://wiki.apache.org/tika/RecursiveMetadata#Main_from_Jukka.27s_Example
> Requirement:
> -----------------
> abc.zip
>    ---> a.doc
>    ---> b.xls
>    ---> pqr.zip
>   -------------> m.ppt
> There are two issues with TIKA:
> 1. How to block extraction embedded doc separately optionally?
> 2. When I extract recussively, file name / or resourceKeyName is not coming properly. For example
>     --> a.doc should have value  abc.zip/a.doc. Similarily for b.xls. This is fine BUT m.ppt is having resource file name as pqr/m.ppt which is WRONG. This should have value abc.zip/pqr.zip/m.ppt.
>     --> Even for the Embedded doc, only random name is coming.. not even with proper file path.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)