You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Koutsoulis Philippe (JIRA)" <ji...@apache.org> on 2013/06/25 11:10:20 UTC
[jira] [Comment Edited] (TIKA-1138) Empty body and empty title with
some TXT documents
[ https://issues.apache.org/jira/browse/TIKA-1138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13692867#comment-13692867 ]
Koutsoulis Philippe edited comment on TIKA-1138 at 6/25/13 9:08 AM:
--------------------------------------------------------------------
Here is the output for [http://www.justice.gouv.fr/art_pix/declaration_sexe_20091016.xls]
{noformat}
Root Entry -
Book
SummaryInformation <(0x05)SummaryInformation>
DocumentSummaryInformation <(0x05)DocumentSummaryInformation>
CompObj <(0x01)CompObj>
Ole <(0x01)Ole>
{noformat}
Here is the output for [http://ge.ch/ssco_gestats/excel/deinfo_par_ht2004.xls]
{noformat}
Root Entry -
SummaryInformation <(0x05)SummaryInformation>
DocumentSummaryInformation <(0x05)DocumentSummaryInformation>
Book
{noformat}
Here is the output for [http://homepage.swissonline.ch/ccvaf1/stock_divers/palmares_ccvaf.xls]
{noformat}
Root Entry -
SummaryInformation <(0x05)SummaryInformation>
DocumentSummaryInformation <(0x05)DocumentSummaryInformation>
Book
{noformat}
Here is the output for [http://ge.ch/ssco_gestats/excel/refona_par_ht2006.xls]
{noformat}
Root Entry -
SummaryInformation <(0x05)SummaryInformation>
DocumentSummaryInformation <(0x05)DocumentSummaryInformation>
Book
{noformat}
Here is the output for [http://www.pfynschiessen.ch/TClassementgroupeinvite.xls]
{noformat}
Root Entry -
CompObj
SummaryInformation
Book
{noformat}
(on) seems to be Excel95 files, I will remove them from my list.
(i) I renamed the issue too
was (Author: philippe.koutsoulis):
Here is the output for [http://www.justice.gouv.fr/art_pix/declaration_sexe_20091016.xls]
{noformat}
Root Entry -
Book
SummaryInformation <(0x05)SummaryInformation>
DocumentSummaryInformation <(0x05)DocumentSummaryInformation>
CompObj <(0x01)CompObj>
Ole <(0x01)Ole>
{noformat}
Here is the output for [http://ge.ch/ssco_gestats/excel/deinfo_par_ht2004.xls]
{noformat}
Root Entry -
SummaryInformation <(0x05)SummaryInformation>
DocumentSummaryInformation <(0x05)DocumentSummaryInformation>
Book
{noformat}
Here is the output for [http://homepage.swissonline.ch/ccvaf1/stock_divers/palmares_ccvaf.xls]
{noformat}
Root Entry -
SummaryInformation <(0x05)SummaryInformation>
DocumentSummaryInformation <(0x05)DocumentSummaryInformation>
Book
{noformat}
Here is the output for [http://ge.ch/ssco_gestats/excel/refona_par_ht2006.xls]
{noformat}
Root Entry -
SummaryInformation <(0x05)SummaryInformation>
DocumentSummaryInformation <(0x05)DocumentSummaryInformation>
Book
{noformat}
Here is the output for [http://www.pfynschiessen.ch/TClassementgroupeinvite.xls]
{noformat}
Root Entry -
CompObj
SummaryInformation
Book
{noformat}
(on) seems to be Excel95 files, I will remove them from my list.
And remove the issue for TXT files
> Empty body and empty title with some TXT documents
> --------------------------------------------------
>
> Key: TIKA-1138
> URL: https://issues.apache.org/jira/browse/TIKA-1138
> Project: Tika
> Issue Type: Bug
> Components: parser
> Affects Versions: 1.3
> Environment: Windows 7
> Reporter: Koutsoulis Philippe
>
> *No error in logs*
> *+Extract from my "Structured Text":+*
> {noformat}
> <?xml version="1.0" encoding="UTF-8"?><html xmlns="http://www.w3.org/1999/xhtml">
> <head>
> ...
> <title/>
> </head>
> <body/></html>
> {noformat}
> *+Files to reproduce+*
> [http://top1000.anthologeek.net/participants.current.txt]
> [http://www.gregdonner.org/workbench/wb_31rev.txt]
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira