You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@lucene.apache.org by ot...@apache.org on 2004/03/02 14:56:03 UTC

cvs commit: jakarta-lucene-sandbox/contributions/analyzers/src/java/org/apache/lucene/analysis/cn ChineseTokenizer.java

otis        2004/03/02 05:56:03

  Modified:    contributions/analyzers/src/java/org/apache/lucene/analysis/cn
                        ChineseTokenizer.java
  Log:
  - Added documentation
  
  Revision  Changes    Path
  1.4       +18 -1     jakarta-lucene-sandbox/contributions/analyzers/src/java/org/apache/lucene/analysis/cn/ChineseTokenizer.java
  
  Index: ChineseTokenizer.java
  ===================================================================
  RCS file: /home/cvs/jakarta-lucene-sandbox/contributions/analyzers/src/java/org/apache/lucene/analysis/cn/ChineseTokenizer.java,v
  retrieving revision 1.3
  retrieving revision 1.4
  diff -u -r1.3 -r1.4
  --- ChineseTokenizer.java	22 Jan 2004 20:54:47 -0000	1.3
  +++ ChineseTokenizer.java	2 Mar 2004 13:56:03 -0000	1.4
  @@ -64,6 +64,23 @@
    *              Rule: A Chinese character as a single token
    * Copyright:   Copyright (c) 2001
    * Company:
  + *
  + * The difference between thr ChineseTokenizer and the
  + * CJKTokenizer (id=23545) is that they have different
  + * token parsing logic.
  + * 
  + * Let me use an example. If having a Chinese text
  + * "C1C2C3C4" to be indexed, the tokens returned from the
  + * ChineseTokenizer are C1, C2, C3, C4. And the tokens
  + * returned from the CJKTokenizer are C1C2, C2C3, C3C4.
  + *
  + * Therefore the index the CJKTokenizer created is much
  + * larger.
  + *
  + * The problem is that when searching for C1, C1C2, C1C3,
  + * C4C2, C1C2C3 ... the ChineseTokenizer works, but the
  + * CJKTokenizer will not work.
  + *
    * @author Yiyi Sun
    * @version 1.0
    *
  @@ -149,4 +166,4 @@
           }
   
       }
  -}
  \ No newline at end of file
  +}
  
  
  

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org