You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Aeham Abushwashi (JIRA)" <ji...@apache.org> on 2015/06/30 13:41:04 UTC

[jira] [Comment Edited] (TIKA-1400) Extract Excel (xls, xlsx) headers and footers

    [ https://issues.apache.org/jira/browse/TIKA-1400?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14608152#comment-14608152 ] 

Aeham Abushwashi edited comment on TIKA-1400 at 6/30/15 11:40 AM:
------------------------------------------------------------------

I've attached a patch which includes the fix, a unit test to verify the fix for XLS files and another unit test to verify that header and footer extraction from XLSX files already works OK.
The test data files are attached separately in case they can't be extracted out of the patch file.


was (Author: aeham.abushwashi):
I've attached a patch which includes the fix, a unit test to verify the fix for XLS files and another unit test to verify that header and footer extraction from XLSX files already works OK.

> Extract Excel (xls, xlsx) headers and footers
> ---------------------------------------------
>
>                 Key: TIKA-1400
>                 URL: https://issues.apache.org/jira/browse/TIKA-1400
>             Project: Tika
>          Issue Type: Bug
>          Components: parser
>    Affects Versions: 1.5
>            Reporter: sunxingzhe
>         Attachments: SpreadsheetWithHeadersAndFooters.xls, SpreadsheetWithHeadersAndFooters.xlsx, TIKA-1400.patch, headers and  footers.xls
>
>
> When I parser xls file,
> the headers's and footers's content can not be parsed. 
> The xlsx file has the same problem.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)