You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by Frank Nestel <in...@doris-frank.de> on 2001/10/12 12:52:31 UTC
Language recognition
Hi,
this was a thread when lucene was still on Sourceforge.
I've done a rough but working port of the text_cat PERL
script for n-gram based language guessing to Java. If this
is useful, it can be found under
http://frank.spieleck.de/ngram
there are javadocs and a jar file. The source code
is not yet available since I apparently need to reprogram
a tiny but central class for copyright reasons. This is not
difficult but I'm preparing for two weeks off right now and
so it won't happen soon.
For me it was just kind of an exercise, since I later
realized that I still have a gap to make it work in my SMIDER
project. If someone has a use for such a system, let me know
so I can readjust this tasks priority for myself :-) Maybe
one would even consider this a potential part of Lucene?
Then I'd be glad to give that source code to apache.
Regards,
Frank
--
------------------------------------------ooO---"---Ooo-------------------
info@doris-frank.de, "I hate this game, lets play it
again"
http://doris-frank.de,
http://duf.spieleck.de/mailman/listinfo
Dr. Frank Sven Nestel, http://spieleck.de,
http://frank.spieleck.de
Spiele von Doris und Frank, Wolfsstaudenring 32, D-91056 Erlangen,
GERMANY
Re: Language recognition
Posted by Dmitry Serebrennikov <dm...@earthlink.net>.
That's great! I think our app may be getting ready for this in, say,
three to six months.
Frank Nestel wrote:
>Hi,
>
>this was a thread when lucene was still on Sourceforge.
>I've done a rough but working port of the text_cat PERL
>script for n-gram based language guessing to Java. If this
>is useful, it can be found under
>
> http://frank.spieleck.de/ngram
>
>there are javadocs and a jar file. The source code
>is not yet available since I apparently need to reprogram
>a tiny but central class for copyright reasons. This is not
>difficult but I'm preparing for two weeks off right now and
>so it won't happen soon.
>
>For me it was just kind of an exercise, since I later
>realized that I still have a gap to make it work in my SMIDER
>project. If someone has a use for such a system, let me know
>so I can readjust this tasks priority for myself :-) Maybe
>one would even consider this a potential part of Lucene?
>Then I'd be glad to give that source code to apache.
>
>Regards,
>Frank
>