You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by Samphan Raruenrom <sa...@osdev.co.th> on 2006/02/22 04:51:33 UTC

ThaiAnalyzer for Lucene

Hi,

I've wrote an alpha version of ThaiAnalyzer to enable
Thai in Lucene full text search.
Thai has no space between words (same for Lao and Khmer),
so you need a dictionary-based word breaker to break words.
I use ICU4j DictionaryBasedBreakIterator for this job.

I want to contribute the code using the Apache license,
so it'll be useful to other people.
How can I do this?
I see analyzers for various languages in the Sandbox.
How can I put the code there?

Thanks.

-- 
_/|\_ Samphan Raruenrom. Open Source Development Co., Ltd.
Tel: +66 38 311816, Fax: +66 38 773128, http://www.osdev.co.th/


Re: ThaiAnalyzer for Lucene

Posted by Karel Tejnora <ka...@tejnora.cz>.
Hi guys,
    I share same problem, that my czech analyzer has dependency on the 
icu4j.
My opinion is to put interface between your code and icu4j. Because new 
JDK 1.6 should have more features from icu4j included.
Samphan you can also look at http://getopt.org/stempel/ Stempel 
algorithm, even I don't know if you can use it directly, may be there 
are some hints.

Karel

>Please create an "issue" in JIRA, and attach your code to it.  We can put the analyzers in the contrib section of Lucene.
>I hope DictionaryBasedBreakIterator is not a compile-time dependency, because we probably can't distribute ICU4J due to the license.
>  
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Re: ThaiAnalyzer for Lucene

Posted by Samphan Raruenrom <sa...@osdev.co.th>.
I'm a C++ program who is completely new to Java.
Can anyone tell me how can I use ICU4j w/o adding compile-time
dependency. In C you'd use sth like dlopen, but in Java, how
can I compile w/o ICU4j but use it when run?

Thanks.

Otis Gospodnetic wrote:
> Hi Samphan,
>
> Please create an "issue" in JIRA, and attach your code to it.  We can put the analyzers in the contrib section of Lucene.
> I hope DictionaryBasedBreakIterator is not a compile-time dependency, because we probably can't distribute ICU4J due to the license.
>
> Otis
>
> ----- Original Message ----
> From: Samphan Raruenrom <sa...@osdev.co.th>
> To: java-dev@lucene.apache.org
> Sent: Tuesday, February 21, 2006 10:51:33 PM
> Subject: ThaiAnalyzer for Lucene
>
> Hi,
>
> I've wrote an alpha version of ThaiAnalyzer to enable
> Thai in Lucene full text search.
> Thai has no space between words (same for Lao and Khmer),
> so you need a dictionary-based word breaker to break words.
> I use ICU4j DictionaryBasedBreakIterator for this job.
>
> I want to contribute the code using the Apache license,
> so it'll be useful to other people.
> How can I do this?
> I see analyzers for various languages in the Sandbox.
> How can I put the code there?
>
> Thanks.
>
>   


-- 
_/|\_ Samphan Raruenrom. Open Source Development Co., Ltd.
Tel: +66 38 311816, Fax: +66 38 773128, http://www.osdev.co.th/


Re: ThaiAnalyzer for Lucene

Posted by Samphan Raruenrom <sa...@osdev.co.th>.
I've finished the work. It no longer use ICU4j. Here :-
http://issues.apache.org/jira/browse/LUCENE-503?page=all

To contribue the code, what should I do next?

Otis Gospodnetic wrote:
> Hi Samphan,
>
> Please create an "issue" in JIRA, and attach your code to it.  We can put the analyzers in the contrib section of Lucene.
> I hope DictionaryBasedBreakIterator is not a compile-time dependency, because we probably can't distribute ICU4J due to the license.
>
> Otis
>
> ----- Original Message ----
> From: Samphan Raruenrom <sa...@osdev.co.th>
> To: java-dev@lucene.apache.org
> Sent: Tuesday, February 21, 2006 10:51:33 PM
> Subject: ThaiAnalyzer for Lucene
>
> Hi,
>
> I've wrote an alpha version of ThaiAnalyzer to enable
> Thai in Lucene full text search.
> Thai has no space between words (same for Lao and Khmer),
> so you need a dictionary-based word breaker to break words.
> I use ICU4j DictionaryBasedBreakIterator for this job.
>
> I want to contribute the code using the Apache license,
> so it'll be useful to other people.
> How can I do this?
> I see analyzers for various languages in the Sandbox.
> How can I put the code there?
>
> Thanks.
>
>   


-- 
_/|\_ Samphan Raruenrom. Open Source Development Co., Ltd.
Tel: +66 38 311816, Fax: +66 38 773128, http://www.osdev.co.th/


Re: ThaiAnalyzer for Lucene

Posted by Otis Gospodnetic <ot...@yahoo.com>.
Hi Samphan,

Please create an "issue" in JIRA, and attach your code to it.  We can put the analyzers in the contrib section of Lucene.
I hope DictionaryBasedBreakIterator is not a compile-time dependency, because we probably can't distribute ICU4J due to the license.

Otis

----- Original Message ----
From: Samphan Raruenrom <sa...@osdev.co.th>
To: java-dev@lucene.apache.org
Sent: Tuesday, February 21, 2006 10:51:33 PM
Subject: ThaiAnalyzer for Lucene

Hi,

I've wrote an alpha version of ThaiAnalyzer to enable
Thai in Lucene full text search.
Thai has no space between words (same for Lao and Khmer),
so you need a dictionary-based word breaker to break words.
I use ICU4j DictionaryBasedBreakIterator for this job.

I want to contribute the code using the Apache license,
so it'll be useful to other people.
How can I do this?
I see analyzers for various languages in the Sandbox.
How can I put the code there?

Thanks.

-- 
_/|\_ Samphan Raruenrom. Open Source Development Co., Ltd.
Tel: +66 38 311816, Fax: +66 38 773128, http://www.osdev.co.th/





---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org