You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Tim Allison (JIRA)" <ji...@apache.org> on 2018/01/22 20:15:00 UTC
[jira] [Created] (TIKA-2550) ToTextHandler includes
element content
Tim Allison created TIKA-2550:
---------------------------------
Summary: ToTextHandler includes <style/> element content
Key: TIKA-2550
URL: https://issues.apache.org/jira/browse/TIKA-2550
Project: Tika
Issue Type: Bug
Reporter: Tim Allison
When using the ToTextHandler to process .java files, the <style/> element content is included, e.g.:
{noformat}
testFile
code {
color: rgb(0,0,0); font-family: monospace; font-size: 12px; white-space: nowrap;
}
.java_plain {
color: rgb(0,0,0);
}
.java_keyword {
color: rgb(0,0,0); font-weight: bold;
}
.java_javadoc_tag {
color: rgb(147,147,147); background-color: rgb(247,247,247); font-style: italic; font-weight: bold;
}
h1 {
font-family: sans-serif; font-size: 16pt; font-weight: bold; color: rgb(0,0,0); background: rgb(210,210,210); border: solid 1px black; padding: 5px; text-align: center;
}
.java_type {
color: rgb(0,44,221);
}
.java_literal {
color: rgb(188,0,0);
}
.java_javadoc_comment {
color: rgb(147,147,147); background-color: rgb(247,247,247); font-style: italic;
}
.java_operator {
color: rgb(0,124,31);
}
.java_separator {
color: rgb(0,33,255);
}
.java_comment {
color: rgb(147,147,147); background-color: rgb(247,247,247);
}
testFile/*************************************************************************
* Compilation: javac HelloWorld.java
* Execution: java HelloWorld
*
* Prints "Hello, World". By tradition, this is everyone's first program.
*
*************************************************************************/
public class HelloWorld {
public static void main(String[] args) {
System.out.println("Hello, World");
}
}
{noformat}
Is this what we want as the default behavior?
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)