You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Aleksandr Dubinsky (JIRA)" <ji...@apache.org> on 2014/05/24 07:19:01 UTC
[jira] [Created] (TIKA-1309) RTF TextExtractor can ignore
consecutive linebreaks
Aleksandr Dubinsky created TIKA-1309:
----------------------------------------
Summary: RTF TextExtractor can ignore consecutive linebreaks
Key: TIKA-1309
URL: https://issues.apache.org/jira/browse/TIKA-1309
Project: Tika
Issue Type: Bug
Components: parser
Affects Versions: 1.5, 1.6
Reporter: Aleksandr Dubinsky
Some RTF files encode consecutive linebreaks as simply consecutive \par commands. However, org.apache.tika.parser.rtf.TextExtractor ignores the second \par.
Solution is to replace at line 1158:
} else if (equals("par")) {
if (!ignored) {
endParagraph(true);
}
}
with:
} else if (equals("par")) {
if (!ignored) {
lazyStartParagraph();
endParagraph(true);
}
}
--
This message was sent by Atlassian JIRA
(v6.2#6252)