You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@creadur.apache.org by co...@google.com on 2009/07/12 22:35:55 UTC

[apache-rat-pd commit] r39 - Decomposer of words is improved.

Author: maka82
Date: Sun Jul 12 13:34:46 2009
New Revision: 39

Modified:
    trunk/src/main/java/org/apache/rat/pd/core/SourceCodeAnalyser.java
     
trunk/src/main/java/org/apache/rat/pd/heuristic/comment/JavaCommentHeuristicChecker.java

Log:
Decomposer of words is improved.

Modified: trunk/src/main/java/org/apache/rat/pd/core/SourceCodeAnalyser.java
==============================================================================
--- trunk/src/main/java/org/apache/rat/pd/core/SourceCodeAnalyser.java	 
(original)
+++ trunk/src/main/java/org/apache/rat/pd/core/SourceCodeAnalyser.java	Sun  
Jul 12 13:34:46 2009
@@ -201,10 +201,11 @@
  	private StringBuffer combineTokens(String[] tokens, int start, int end) {

  		StringBuffer sb = new StringBuffer();
-		for (int k = start; k <= end; k++) {
+		for (int k = start; k < end; k++) {
  			sb.append(tokens[k]);
  			sb.append(" ");
  		}
+		sb.append(tokens[end]);
  		return sb;
  	}

@@ -212,6 +213,7 @@
  	 * extract tokens
  	 */
  	private String[] tokeniseString(String file) {
+		file = file.replaceAll("\\n", "\n ");
  		String[] tokens = file.split(STRING_DELIMETER_REGEX);
  		// this simple tokeniser returns array {""} when "" is tokenised
  		// I must avoid that behavior

Modified:  
trunk/src/main/java/org/apache/rat/pd/heuristic/comment/JavaCommentHeuristicChecker.java
==============================================================================
---  
trunk/src/main/java/org/apache/rat/pd/heuristic/comment/JavaCommentHeuristicChecker.java	 
(original)
+++  
trunk/src/main/java/org/apache/rat/pd/heuristic/comment/JavaCommentHeuristicChecker.java	 
Sun Jul 12 13:34:46 2009
@@ -33,7 +33,7 @@
  	 * This regular expression match comments in Java. More info on:{@link}
  	 * http://ostermiller.org/findcomment.html
  	 */
-	private static final String JAVA_COMMENT_REGEX = "(/\\*(?:[^*]| 
(?:\\*+[^*/]))*\\*+/)|(//.*[\\n\\r])";
+	private static final String JAVA_COMMENT_REGEX = "(/\\*(?:[^*]| 
(?:\\*+[^*/]))*\\*+/[\\n\\r]*)|(//.*[\\n\\r])";

  	public JavaCommentHeuristicChecker(int limit) {
  		super(JAVA_COMMENT_REGEX, limit);

Re: [apache-rat-pd commit] r39 - Decomposer of words is improved.

Posted by Alexei Fedotov <al...@gmail.com>.
Marija, great.
I have a small piece of advice relating to your regular expressions,
JFYI. Java tools to parse code already exist and can be reused. For
now, I suggest leaving things as is, just taking a look at ANTLR and
JavaCC [1], [2], [3], [4]. I believe our parser design does not
prevent us from plugging these popular grammar compilers later.

[1] http://www.antlr.org/ (please, note, that the last version of this
tool has a license which is incompatible with APL)
[2] http://www.antlr.org/grammar/list
[3] http://javacc.dev.java.net/
[4] http://javacc.dev.java.net/servlets/ProjectDocumentList?folderID=110



On Mon, Jul 13, 2009 at 12:35 AM, <co...@google.com> wrote:
> Author: maka82
> Date: Sun Jul 12 13:34:46 2009
> New Revision: 39
>
> Modified:
>   trunk/src/main/java/org/apache/rat/pd/core/SourceCodeAnalyser.java
>
> trunk/src/main/java/org/apache/rat/pd/heuristic/comment/JavaCommentHeuristicChecker.java
>
> Log:
> Decomposer of words is improved.
>
> Modified: trunk/src/main/java/org/apache/rat/pd/core/SourceCodeAnalyser.java
> ==============================================================================
> --- trunk/src/main/java/org/apache/rat/pd/core/SourceCodeAnalyser.java
>  (original)
> +++ trunk/src/main/java/org/apache/rat/pd/core/SourceCodeAnalyser.java  Sun
> Jul 12 13:34:46 2009
> @@ -201,10 +201,11 @@
>        private StringBuffer combineTokens(String[] tokens, int start, int
> end) {
>
>                StringBuffer sb = new StringBuffer();
> -               for (int k = start; k <= end; k++) {
> +               for (int k = start; k < end; k++) {
>                        sb.append(tokens[k]);
>                        sb.append(" ");
>                }
> +               sb.append(tokens[end]);
>                return sb;
>        }
>
> @@ -212,6 +213,7 @@
>         * extract tokens
>         */
>        private String[] tokeniseString(String file) {
> +               file = file.replaceAll("\\n", "\n ");
>                String[] tokens = file.split(STRING_DELIMETER_REGEX);
>                // this simple tokeniser returns array {""} when "" is
> tokenised
>                // I must avoid that behavior
>
> Modified:
> trunk/src/main/java/org/apache/rat/pd/heuristic/comment/JavaCommentHeuristicChecker.java
> ==============================================================================
> ---
> trunk/src/main/java/org/apache/rat/pd/heuristic/comment/JavaCommentHeuristicChecker.java
>        (original)
> +++
> trunk/src/main/java/org/apache/rat/pd/heuristic/comment/JavaCommentHeuristicChecker.java
>        Sun Jul 12 13:34:46 2009
> @@ -33,7 +33,7 @@
>         * This regular expression match comments in Java. More info
> on:{@link}
>         * http://ostermiller.org/findcomment.html
>         */
> -       private static final String JAVA_COMMENT_REGEX =
> "(/\\*(?:[^*]|(?:\\*+[^*/]))*\\*+/)|(//.*[\\n\\r])";
> +       private static final String JAVA_COMMENT_REGEX =
> "(/\\*(?:[^*]|(?:\\*+[^*/]))*\\*+/[\\n\\r]*)|(//.*[\\n\\r])";
>
>        public JavaCommentHeuristicChecker(int limit) {
>                super(JAVA_COMMENT_REGEX, limit);
>



-- 
With best regards / с наилучшими пожеланиями,
Alexei Fedotov / Алексей Федотов,
http://www.telecom-express.ru/
http://harmony.apache.org/
http://code.google.com/p/openmeetings/