You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by Frank Nestel <in...@doris-frank.de> on 2001/10/12 12:52:31 UTC

Language recognition

Hi,

this was a thread when lucene was still on Sourceforge.
I've done a rough but working port of the text_cat PERL
script for n-gram based language guessing to Java. If this
is useful, it can be found under

	http://frank.spieleck.de/ngram

there are javadocs and a jar file. The source code 
is not yet available since I apparently need to reprogram 
a tiny but central class for copyright reasons. This is not
difficult but I'm preparing for two weeks off right now and
so it won't happen soon.

For me it was just kind of an exercise, since I later 
realized that I still have a gap to make it work in my SMIDER
project. If someone has a use for such a system, let me know 
so I can readjust this tasks priority for myself :-) Maybe
one would even consider this a potential part of Lucene?
Then I'd be glad to give that source code to apache.

Regards,
Frank

-- 
------------------------------------------ooO---"---Ooo-------------------
info@doris-frank.de,                "I hate this game, lets play it
again"
http://doris-frank.de,            
http://duf.spieleck.de/mailman/listinfo   
Dr. Frank  Sven  Nestel,      http://spieleck.de,
http://frank.spieleck.de 
Spiele von Doris und Frank, Wolfsstaudenring 32, D-91056 Erlangen,
GERMANY

Re: Language recognition

Posted by Dmitry Serebrennikov <dm...@earthlink.net>.
That's great! I think our app may be getting ready for this in, say, 
three to six months.

Frank Nestel wrote:

>Hi,
>
>this was a thread when lucene was still on Sourceforge.
>I've done a rough but working port of the text_cat PERL
>script for n-gram based language guessing to Java. If this
>is useful, it can be found under
>
>	http://frank.spieleck.de/ngram
>
>there are javadocs and a jar file. The source code 
>is not yet available since I apparently need to reprogram 
>a tiny but central class for copyright reasons. This is not
>difficult but I'm preparing for two weeks off right now and
>so it won't happen soon.
>
>For me it was just kind of an exercise, since I later 
>realized that I still have a gap to make it work in my SMIDER
>project. If someone has a use for such a system, let me know 
>so I can readjust this tasks priority for myself :-) Maybe
>one would even consider this a potential part of Lucene?
>Then I'd be glad to give that source code to apache.
>
>Regards,
>Frank
>