You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Daniel Bonniot de Ruisselet (Issue Comment Edited) (JIRA)" <ji...@apache.org> on 2012/03/21 10:15:39 UTC
[jira] [Issue Comment Edited] (TIKA-877) Embedded document not extracted (regression)

    [ https://issues.apache.org/jira/browse/TIKA-877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13232846#comment-13232846 ] 

Daniel Bonniot de Ruisselet edited comment on TIKA-877 at 3/21/12 9:14 AM:
---------------------------------------------------------------------------

The regression appears with this commit:

r1221112 | nick | 2011-12-20 07:15:29 +0100 (Tue, 20 Dec 2011) | 1 line

TIKA-757 Tidy the OLE10Native extractor code now that POI has been upgraded

http://svn.apache.org/viewvc?view=revision&revision=1221112
                
      was (Author: dbr):
    The regression appears with this commit:

r1221112 | nick | 2011-12-20 07:15:29 +0100 (Tue, 20 Dec 2011) | 1 line

TIKA-757 Tidy the OLE10Native extractor code now that POI has been upgraded

                  
> Embedded document not extracted (regression)
> --------------------------------------------
>
>                 Key: TIKA-877
>                 URL: https://issues.apache.org/jira/browse/TIKA-877
>             Project: Tika
>          Issue Type: Bug
>          Components: parser
>    Affects Versions: 1.1
>            Reporter: Daniel Bonniot de Ruisselet
>            Assignee: Maxim Valyanskiy
>            Priority: Blocker
>              Labels: regression
>             Fix For: 1.1
>
>         Attachments: coffee.xls
>
>
> Testing the 1.1 rc, I believe I found a regression, hence the priority.
> {noformat}
> dbonniot-t520 /tmp/1.0 java -jar ../tika-app-1.0.jar -z ../coffee.xls 
> Extracting 'file0.wmf' (application/x-msmetafile)
> Extracting 'file1.wmf' (application/x-msmetafile)
> Extracting 'file2.wmf' (application/x-msmetafile)
> Extracting 'file3.wmf' (application/x-msmetafile)
> Extracting 'file4.png' (image/png)
> Extracting 'MBD002B040A.wps' (application/vnd.ms-works)
> Extracting 'file5.bin' (application/octet-stream)
> Extracting 'MBD00262FE3.unknown' (application/x-tika-msoffice)
> dbonniot-t520 /tmp/1.0 cd ../1.1
> dbonniot-t520 /tmp/1.1 java -jar ../tika-app-1.1.jar -z ../coffee.xls 
> Extracting 'file0.emf' (application/x-emf)
> Extracting 'file1.emf' (application/x-emf)
> Extracting 'file2.emf' (application/x-emf)
> Extracting 'file3.emf' (application/x-emf)
> Extracting 'file4.png' (image/png)
> Extracting 'MBD002B040A.wps' (application/vnd.ms-works)
> Extracting 'file5' (application/x-tika-msoffice-embedded)
> Extracting 'MBD00262FE3.unknown' (application/x-tika-msoffice)
> dbonniot-t520 /tmp/1.1 ls -l ../1.0/file5.bin ../1.1/file5 
> -rw-r--r-- 1 dbonniot dbonniot 2519 2012-03-18 21:51 ../1.0/file5.bin
> -rw-r--r-- 1 dbonniot dbonniot    0 2012-03-18 21:51 ../1.1/file5
> {noformat}
> Notice how 1.0 could extract the data for file5, but 1.1 creates an empty file instead.
> By the way, I do see improvements in 1.1 as well, congrats for that!

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira