You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Leo (JIRA)" <ji...@apache.org> on 2013/10/07 05:59:42 UTC
[jira] [Created] (TIKA-1181) RTFParser not keeping HTML font colors
and underscore tags.
Leo created TIKA-1181:
-------------------------
Summary: RTFParser not keeping HTML font colors and underscore tags.
Key: TIKA-1181
URL: https://issues.apache.org/jira/browse/TIKA-1181
Project: Tika
Issue Type: Bug
Components: parser
Affects Versions: 1.4
Environment: Windows server 2008
Reporter: Leo
Hi,
I'm having problems with this code. It does not put the font colors and underscores "<u></u>" tags in the HTML from the RTF string. Is there anything I can do to put them there?
Code:
InputStream in = new ByteArrayInputStream(rtfString.getBytes("UTF-8"));
org.apache.tika.parser.rtf.RTFParser parser = new org.apache.tika.parser.rtf.RTFParser();
Metadata metadata = new Metadata();
StringWriter sw = new StringWriter();
SAXTransformerFactory factory = (SAXTransformerFactory)
SAXTransformerFactory.newInstance();
TransformerHandler handler = factory.newTransformerHandler();
handler.getTransformer().setOutputProperty(OutputKeys.METHOD, "xml");
handler.getTransformer().setOutputProperty(OutputKeys.INDENT, "no");
handler.setResult(new StreamResult(sw));
parser.parse(in, handler, metadata, new ParseContext());
String xhtml = sw.toString();
xhtml = xhtml.replaceAll("\r\n", "<br>\r\n");
Thanks for looking at it.
Leo
--
This message was sent by Atlassian JIRA
(v6.1#6144)