You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Leo (JIRA)" <ji...@apache.org> on 2013/10/07 05:59:42 UTC

[jira] [Created] (TIKA-1181) RTFParser not keeping HTML font colors and underscore tags.

Leo created TIKA-1181:
-------------------------

             Summary: RTFParser not keeping HTML font colors and underscore tags.
                 Key: TIKA-1181
                 URL: https://issues.apache.org/jira/browse/TIKA-1181
             Project: Tika
          Issue Type: Bug
          Components: parser
    Affects Versions: 1.4
         Environment: Windows server 2008
            Reporter: Leo


Hi,

I'm having problems with this code. It does not put the font colors and underscores "<u></u>" tags in the HTML from the RTF string. Is there anything I can do to put them there? 

Code:
InputStream in = new ByteArrayInputStream(rtfString.getBytes("UTF-8"));  
		   
org.apache.tika.parser.rtf.RTFParser parser = new org.apache.tika.parser.rtf.RTFParser();
		   		   
Metadata metadata = new Metadata();

StringWriter sw = new StringWriter();
SAXTransformerFactory factory = (SAXTransformerFactory)
		             SAXTransformerFactory.newInstance();
TransformerHandler handler = factory.newTransformerHandler();
		    handler.getTransformer().setOutputProperty(OutputKeys.METHOD, "xml");
		    handler.getTransformer().setOutputProperty(OutputKeys.INDENT, "no");
handler.setResult(new StreamResult(sw));

parser.parse(in, handler, metadata, new ParseContext());

String xhtml = sw.toString();
		    
xhtml = xhtml.replaceAll("\r\n", "<br>\r\n");

Thanks for looking at it.
Leo



--
This message was sent by Atlassian JIRA
(v6.1#6144)