You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "John (Jira)" <ji...@apache.org> on 2022/08/05 07:02:00 UTC

[jira] [Comment Edited] (TIKA-3829) java.lang.IllegalArgumentException: The document is really a XLS file exception while parsing doc file

    [ https://issues.apache.org/jira/browse/TIKA-3829?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17575607#comment-17575607 ] 

John edited comment on TIKA-3829 at 8/5/22 7:01 AM:
----------------------------------------------------

Ok. Will check and get you back if we faced this problem again. 

 

There is any way in tika to exclude some file types from extracting content? It also should be excluded even if files are available inside embedded files.


was (Author: JIRAUSER292452):
Ok. Will check and get you back if we faced this problem again. 

 

There is any way in tika to exclude some file types from scanning? It also should be excluded even if files are available inside embedded files.

> java.lang.IllegalArgumentException: The document is really a XLS file exception while parsing doc file
> ------------------------------------------------------------------------------------------------------
>
>                 Key: TIKA-3829
>                 URL: https://issues.apache.org/jira/browse/TIKA-3829
>             Project: Tika
>          Issue Type: Bug
>          Components: parser
>    Affects Versions: 1.23
>            Reporter: John
>            Priority: Major
>
> Getting following exception while parsing doc file:
> WARN  Ignoring unexpected exception while parsing summary entry DocumentSummaryInformation
> java.lang.IllegalArgumentException: The document is really a XLS file
>     at org.apache.poi.poifs.filesystem.DirectoryNode.getEntry(DirectoryNode.java:322)
>     at org.apache.tika.parser.microsoft.SummaryExtractor.parseSummaryEntryIfExists(SummaryExtractor.java:82)
>     at org.apache.tika.parser.microsoft.SummaryExtractor.parseSummaries(SummaryExtractor.java:74)
>     at org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:155)
>     at org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:131)
>     at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
>     at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
>     at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:143)
>  
> What is the meaning of this exception? when it will be thrown?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)