You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Benoit Moreau (JIRA)" <ji...@apache.org> on 2014/04/21 11:47:15 UTC
[jira] [Commented] (TIKA-1224) Adding Source code (Java, Groovy, C)
parser
[ https://issues.apache.org/jira/browse/TIKA-1224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13975502#comment-13975502 ]
Benoit Moreau commented on TIKA-1224:
-------------------------------------
I'm disappointed because it does not work !
For examples:
> java -jar tika-app-1.5.jar -t Test.java
Output is empty
> java -jar tika-app-1.5.jar -h Test.java
Output is stange
> java -jar tika-app-1.5.jar -T Test.java
Output is what I expect for -h ?
{code:xml}
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> <html xmlns="htt
p://www.w3.org/1999/xhtml" xml:lang="en" lang="en"> <head> <meta http-equiv=
"content-type" content="text/html; charset=ISO-8859-1" /> <meta name="genera
tor" content="JHighlight v1.0 (http://jhighlight.dev.java.net)" /> <title>Te
st.java</title> <link rel="Help" href="http://jhighlight.dev.java.net" />
<style type="text/css"> .java_type { color: rgb(0,44,221); } .java_keyword { c
olor: rgb(0,0,0); font-weight: bold; } .java_javadoc_comment { color: rgb(147,14
7,147); background-color: rgb(247,247,247); font-style: italic; } .java_comment
{ color: rgb(147,147,147); background-color: rgb(247,247,247); } .java_operator
{ color: rgb(0,124,31); } .java_plain { color: rgb(0,0,0); } .java_literal { col
or: rgb(188,0,0); } code { color: rgb(0,0,0); font-family: monospace; font-size:
12px; white-space: nowrap; } .java_javadoc_tag { color: rgb(147,147,147); backg
round-color: rgb(247,247,247); font-style: italic; font-weight: bold; } .java_se
parator { color: rgb(0,33,255); } h1 { font-family: sans-serif; font-size: 16pt;
font-weight: bold; color: rgb(0,0,0); background: rgb(210,210,210); border: sol
id 1px black; padding: 5px; text-align: center; } </style> </head> <body> <h
1>Test.java</h1><code><span class="java_javadoc_comment">/** * Class&n
bsp;Test. * * </span><span class="java_javadoc_tag">@author</span
><span class="java_javadoc_comment"> ben.12 */</span><span class="java
_keyword">public</span><span class="java_plain"> </span><span class="java_k
eyword">class</span><span class="java_plain"> </span><span class="java_type
">Test</span><span class="java_plain"> </span><span class="java_separator">
{</span><span class="java_plain"> </span><span class="java_comment">/
/ Class Test}</span><br /> </code> </body> </html>
{code}
But all is in only one line, indentation is lost and file name appears at beginning.
Author is not in head meta tags.
The last "}" is highlighted as a comment.
\\
My input java file:
{code:title=Test.java}
/**
* Class Test.
*
* @author ben.12
*/
public class Test {
// Class Test
}
{code}
> Adding Source code (Java, Groovy, C) parser
> -------------------------------------------
>
> Key: TIKA-1224
> URL: https://issues.apache.org/jira/browse/TIKA-1224
> Project: Tika
> Issue Type: Improvement
> Components: parser
> Affects Versions: 1.5
> Reporter: Hong-Thai Nguyen
> Priority: Minor
>
> We can parser some source code file formats:
> text/x-java-source
> text/x-groovy
> text/x-c
> for HTML rendering from code, we can use jhightlight: http://www.ohloh.net/p/jhighlight
--
This message was sent by Atlassian JIRA
(v6.2#6252)