You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@creadur.apache.org by co...@google.com on 2009/07/12 22:35:55 UTC
[apache-rat-pd commit] r39 - Decomposer of words is improved.
Author: maka82
Date: Sun Jul 12 13:34:46 2009
New Revision: 39
Modified:
trunk/src/main/java/org/apache/rat/pd/core/SourceCodeAnalyser.java
trunk/src/main/java/org/apache/rat/pd/heuristic/comment/JavaCommentHeuristicChecker.java
Log:
Decomposer of words is improved.
Modified: trunk/src/main/java/org/apache/rat/pd/core/SourceCodeAnalyser.java
==============================================================================
--- trunk/src/main/java/org/apache/rat/pd/core/SourceCodeAnalyser.java
(original)
+++ trunk/src/main/java/org/apache/rat/pd/core/SourceCodeAnalyser.java Sun
Jul 12 13:34:46 2009
@@ -201,10 +201,11 @@
private StringBuffer combineTokens(String[] tokens, int start, int end) {
StringBuffer sb = new StringBuffer();
- for (int k = start; k <= end; k++) {
+ for (int k = start; k < end; k++) {
sb.append(tokens[k]);
sb.append(" ");
}
+ sb.append(tokens[end]);
return sb;
}
@@ -212,6 +213,7 @@
* extract tokens
*/
private String[] tokeniseString(String file) {
+ file = file.replaceAll("\\n", "\n ");
String[] tokens = file.split(STRING_DELIMETER_REGEX);
// this simple tokeniser returns array {""} when "" is tokenised
// I must avoid that behavior
Modified:
trunk/src/main/java/org/apache/rat/pd/heuristic/comment/JavaCommentHeuristicChecker.java
==============================================================================
---
trunk/src/main/java/org/apache/rat/pd/heuristic/comment/JavaCommentHeuristicChecker.java
(original)
+++
trunk/src/main/java/org/apache/rat/pd/heuristic/comment/JavaCommentHeuristicChecker.java
Sun Jul 12 13:34:46 2009
@@ -33,7 +33,7 @@
* This regular expression match comments in Java. More info on:{@link}
* http://ostermiller.org/findcomment.html
*/
- private static final String JAVA_COMMENT_REGEX = "(/\\*(?:[^*]|
(?:\\*+[^*/]))*\\*+/)|(//.*[\\n\\r])";
+ private static final String JAVA_COMMENT_REGEX = "(/\\*(?:[^*]|
(?:\\*+[^*/]))*\\*+/[\\n\\r]*)|(//.*[\\n\\r])";
public JavaCommentHeuristicChecker(int limit) {
super(JAVA_COMMENT_REGEX, limit);
Re: [apache-rat-pd commit] r39 - Decomposer of words is improved.
Posted by Alexei Fedotov <al...@gmail.com>.
Marija, great.
I have a small piece of advice relating to your regular expressions,
JFYI. Java tools to parse code already exist and can be reused. For
now, I suggest leaving things as is, just taking a look at ANTLR and
JavaCC [1], [2], [3], [4]. I believe our parser design does not
prevent us from plugging these popular grammar compilers later.
[1] http://www.antlr.org/ (please, note, that the last version of this
tool has a license which is incompatible with APL)
[2] http://www.antlr.org/grammar/list
[3] http://javacc.dev.java.net/
[4] http://javacc.dev.java.net/servlets/ProjectDocumentList?folderID=110
On Mon, Jul 13, 2009 at 12:35 AM, <co...@google.com> wrote:
> Author: maka82
> Date: Sun Jul 12 13:34:46 2009
> New Revision: 39
>
> Modified:
> trunk/src/main/java/org/apache/rat/pd/core/SourceCodeAnalyser.java
>
> trunk/src/main/java/org/apache/rat/pd/heuristic/comment/JavaCommentHeuristicChecker.java
>
> Log:
> Decomposer of words is improved.
>
> Modified: trunk/src/main/java/org/apache/rat/pd/core/SourceCodeAnalyser.java
> ==============================================================================
> --- trunk/src/main/java/org/apache/rat/pd/core/SourceCodeAnalyser.java
> (original)
> +++ trunk/src/main/java/org/apache/rat/pd/core/SourceCodeAnalyser.java Sun
> Jul 12 13:34:46 2009
> @@ -201,10 +201,11 @@
> private StringBuffer combineTokens(String[] tokens, int start, int
> end) {
>
> StringBuffer sb = new StringBuffer();
> - for (int k = start; k <= end; k++) {
> + for (int k = start; k < end; k++) {
> sb.append(tokens[k]);
> sb.append(" ");
> }
> + sb.append(tokens[end]);
> return sb;
> }
>
> @@ -212,6 +213,7 @@
> * extract tokens
> */
> private String[] tokeniseString(String file) {
> + file = file.replaceAll("\\n", "\n ");
> String[] tokens = file.split(STRING_DELIMETER_REGEX);
> // this simple tokeniser returns array {""} when "" is
> tokenised
> // I must avoid that behavior
>
> Modified:
> trunk/src/main/java/org/apache/rat/pd/heuristic/comment/JavaCommentHeuristicChecker.java
> ==============================================================================
> ---
> trunk/src/main/java/org/apache/rat/pd/heuristic/comment/JavaCommentHeuristicChecker.java
> (original)
> +++
> trunk/src/main/java/org/apache/rat/pd/heuristic/comment/JavaCommentHeuristicChecker.java
> Sun Jul 12 13:34:46 2009
> @@ -33,7 +33,7 @@
> * This regular expression match comments in Java. More info
> on:{@link}
> * http://ostermiller.org/findcomment.html
> */
> - private static final String JAVA_COMMENT_REGEX =
> "(/\\*(?:[^*]|(?:\\*+[^*/]))*\\*+/)|(//.*[\\n\\r])";
> + private static final String JAVA_COMMENT_REGEX =
> "(/\\*(?:[^*]|(?:\\*+[^*/]))*\\*+/[\\n\\r]*)|(//.*[\\n\\r])";
>
> public JavaCommentHeuristicChecker(int limit) {
> super(JAVA_COMMENT_REGEX, limit);
>
--
With best regards / с наилучшими пожеланиями,
Alexei Fedotov / Алексей Федотов,
http://www.telecom-express.ru/
http://harmony.apache.org/
http://code.google.com/p/openmeetings/