You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Robert Muir (JIRA)" <ji...@apache.org> on 2010/07/01 18:51:49 UTC
[jira] Updated: (LUCENE-2522) add simple japanese tokenizer, based
on tinysegmenter
[ https://issues.apache.org/jira/browse/LUCENE-2522?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Robert Muir updated LUCENE-2522:
--------------------------------
Attachment: LUCENE-2522.patch
here is a really quickly done patch, just to get started (not really for committing)
* converted their tests to basetokenstream tests,
* changed it to use CharTermAttribute instead of TermAttribute,
* added clearAttributes()
* made class final.
* added solr factory.
The code is nice, it is setup to work on unicode codepoints etc, but i think we can improve
it by using CharArrayMaps for speed and by using lucene's codepoint i/o stuff in CharUtils.
> add simple japanese tokenizer, based on tinysegmenter
> -----------------------------------------------------
>
> Key: LUCENE-2522
> URL: https://issues.apache.org/jira/browse/LUCENE-2522
> Project: Lucene - Java
> Issue Type: New Feature
> Components: contrib/analyzers
> Reporter: Robert Muir
> Priority: Minor
> Attachments: LUCENE-2522.patch
>
>
> TinySegmenter (http://www.chasen.org/~taku/software/TinySegmenter/) is a tiny japanese segmenter.
> It was ported to java/lucene by Kohei TAKETA <k-...@void.in>,
> and is under friendly license terms (BSD, some files explicitly disclaim copyright to the source code, giving a blessing instead)
> Koji knows the author, and already contacted about incorporating into lucene:
> {noformat}
> I've contacted Takeda-san who is the creater of Java version of
> TinySegmenter. He said he is happy if his program is part of Lucene.
> He is a co-author of my book about Solr published in Japan, BTW. ;-)
> {noformat}
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org