You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by bhecht <bh...@ams-sys.com> on 2007/05/19 18:22:31 UTC
What is the best way to split substring words
Hi there,
I want to be able to split tokens by giving a list of substring words.
So I can give a list f subwords like: "strasse", "gasse",
And the token "mainstrasse" or "maingasse" will be split to 2 tokens "main"
and "strasse".
Thanks
--
View this message in context: http://www.nabble.com/What-is-the-best-way-to-split-substring-words-tf3782977.html#a10698288
Sent from the Lucene - Java Users mailing list archive at Nabble.com.
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: What is the best way to split substring words
Posted by Soeren Pekrul <so...@gmx.de>.
bhecht wrote:
> I want to be able to split tokens by giving a list of substring words.
> So I can give a list f subwords like: "strasse", "gasse",
> And the token "mainstrasse" or "maingasse" will be split to 2 tokens "main"
> and "strasse".
IMBEMBA, PASQUALINO: A Splitter for German Compound Words. Free
University of Bolzano, Bozen, 2006
http://www.gossamer-threads.com/lists/lucene/java-user/40164?do=post_view_threaded
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: What is the best way to split substring words
Posted by Erick Erickson <er...@gmail.com>.
You probably should write a custom analyzer and/or filter that breaks
your streams up into the custom tokens you want. Depending upon
what you're really trying to accomplish, you may well need to use the
same analyzer at BOTH index and search times.
Best
Erick
On 5/19/07, bhecht <bh...@ams-sys.com> wrote:
>
>
> Hi there,
>
> I want to be able to split tokens by giving a list of substring words.
> So I can give a list f subwords like: "strasse", "gasse",
> And the token "mainstrasse" or "maingasse" will be split to 2 tokens
> "main"
> and "strasse".
>
> Thanks
>
> --
> View this message in context:
> http://www.nabble.com/What-is-the-best-way-to-split-substring-words-tf3782977.html#a10698288
> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>