You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by bhecht <bh...@ams-sys.com> on 2007/05/19 18:22:31 UTC

What is the best way to split substring words

Hi there,

I want to be able to split tokens by giving a list of substring words.
So I can give a list f subwords like: "strasse", "gasse",
And the token "mainstrasse" or "maingasse"  will be split to 2 tokens "main"
and "strasse".

Thanks

-- 
View this message in context: http://www.nabble.com/What-is-the-best-way-to-split-substring-words-tf3782977.html#a10698288
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: What is the best way to split substring words

Posted by Soeren Pekrul <so...@gmx.de>.
bhecht wrote:
> I want to be able to split tokens by giving a list of substring words.
> So I can give a list f subwords like: "strasse", "gasse",
> And the token "mainstrasse" or "maingasse"  will be split to 2 tokens "main"
> and "strasse".

IMBEMBA, PASQUALINO: A Splitter for German Compound Words. Free 
University of Bolzano, Bozen, 2006

http://www.gossamer-threads.com/lists/lucene/java-user/40164?do=post_view_threaded

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: What is the best way to split substring words

Posted by Erick Erickson <er...@gmail.com>.
You probably should write a custom analyzer and/or filter that breaks
your streams up into the custom tokens you want. Depending upon
what you're really trying to accomplish, you may well need to use the
same analyzer at BOTH index and search times.

Best
Erick


On 5/19/07, bhecht <bh...@ams-sys.com> wrote:
>
>
> Hi there,
>
> I want to be able to split tokens by giving a list of substring words.
> So I can give a list f subwords like: "strasse", "gasse",
> And the token "mainstrasse" or "maingasse"  will be split to 2 tokens
> "main"
> and "strasse".
>
> Thanks
>
> --
> View this message in context:
> http://www.nabble.com/What-is-the-best-way-to-split-substring-words-tf3782977.html#a10698288
> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>