You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-dev@lucene.apache.org by Mark Bennett <mb...@ideaeng.com> on 2009/06/30 01:42:31 UTC

Anybody using Japanese SEN with recent versions of Solr ?

I've been reading through the SEN project doc and various Japanese blogs,
but still having some issues.

In particular, it seems like perhaps you're supposed to have BOTH
sen-1.2.2.1 and lucene-ja-2.0test2 installed?

I guess the lucene-ja is an adapter layer between the org.apache.lucene
analyzers and base net.java Tokenizers, whereas sen-1.2.2.1 is the base SEN
package, and is not aware of Lucene/Solr.  So I guess you need both.

But both versions have Lucene classes, and the lucene-ja stuff seems to be
using very old Lucene.  I'm not sure how you layer this all together with a
more recent Solr implemenation?  (using nightly stable)

Or perhaps the older lucene-ja is intended to already have SEN, it does have
some SEN files, but they are quite a bit older than the SEN 1221 stuff, and
you've still got the old Lucene version issue.

Any input would be appreciated.

--
Mark Bennett / New Idea Engineering, Inc. / mbennett@ideaeng.com
Direct: 408-733-0387 / Main: 866-IDEA-ENG / Cell: 408-829-6513

Re: Anybody using Japanese SEN with recent versions of Solr ?

Posted by Koji Sekiguchi <ko...@r.email.ne.jp>.
Mark,

I think you can develop your tokenizer which calls sen to tokenize 
Japanese sentences.
To develop your tokenizer, you can see the source code of lucene-ja.
I think you can find the source code in lucene-ja.jar, but I'm not sure.

Koji


Mark Bennett wrote:
> I've been reading through the SEN project doc and various Japanese blogs,
> but still having some issues.
>
> In particular, it seems like perhaps you're supposed to have BOTH
> sen-1.2.2.1 and lucene-ja-2.0test2 installed?
>
> I guess the lucene-ja is an adapter layer between the org.apache.lucene
> analyzers and base net.java Tokenizers, whereas sen-1.2.2.1 is the base SEN
> package, and is not aware of Lucene/Solr.  So I guess you need both.
>
> But both versions have Lucene classes, and the lucene-ja stuff seems to be
> using very old Lucene.  I'm not sure how you layer this all together with a
> more recent Solr implemenation?  (using nightly stable)
>
> Or perhaps the older lucene-ja is intended to already have SEN, it does have
> some SEN files, but they are quite a bit older than the SEN 1221 stuff, and
> you've still got the old Lucene version issue.
>
> Any input would be appreciated.
>
> --
> Mark Bennett / New Idea Engineering, Inc. / mbennett@ideaeng.com
> Direct: 408-733-0387 / Main: 866-IDEA-ENG / Cell: 408-829-6513
>
>