You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Jukka Zitting (JIRA)" <ji...@apache.org> on 2009/06/25 16:54:08 UTC

[jira] Commented: (TIKA-250) XLS parser does not extract empty sheet names

    [ https://issues.apache.org/jira/browse/TIKA-250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12724114#action_12724114 ] 

Jukka Zitting commented on TIKA-250:
------------------------------------

The currentSheet.isEmpty() conditional was added explicitly to avoid outputting empty sheets. Most Excel files out there have the three default worksheets but in the majority of cases only the first sheet contains anything and it's cleaner if the empty extra sheets aren't included in the output.

Are there real world cases where the name of an empty sheet is an important part of the extracted text content? I would assume that any essential sheets contain at least some content beside the sheet name.

> XLS parser does not extract empty sheet names
> ---------------------------------------------
>
>                 Key: TIKA-250
>                 URL: https://issues.apache.org/jira/browse/TIKA-250
>             Project: Tika
>          Issue Type: Bug
>          Components: parser
>    Affects Versions: 0.4
>            Reporter: Maxim Valyanskiy
>            Priority: Minor
>         Attachments: empty.patch
>
>
> ExcelExtractor misses sheet titles if sheet is empty. Fix it trivial, patch attached

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.