You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Mike Klaas <mi...@gmail.com> on 2008/04/01 00:39:53 UTC
Re: Indexing a word in url
On 31-Mar-08, at 10:50 AM, Vinci wrote:
>
> Hi all,
>
> I would like to ask, if I want to index word in a URL, which data
> type and
> parser should I use?
Depends on how you want to search it. I use WordDelimiterFilter with
parts generation on only (no catenation), and an additiona stopwords
like that excludes a few tokens like 'http'.
-Mike
Re: Indexing a word in url
Posted by Simon Rosenthal <si...@yahoo.com>.
I also couldn't get the exact results I wanted for indexing URL components
using WordDelimeterFilter or patternTokenizer, so resorted to adding a new
field ('pathparts'), plus a few lines of code to generate the tokens in our
content preprocessor which submits documents to SOLR for indexing.
-Simon
On Tue, Apr 1, 2008 at 7:24 PM, Chris Hostetter <ho...@fucit.org>
wrote:
>
> : Actually I want to use anything that is not alphabet or digit to be the
> : separator - anything between them will be a word (so that I can use the
> URL
> : fragment to see what is indexed about this site)...any suggestion?
>
> In addition to Mike's suggestion of trying out the WordDelimiterFilter,
> take a look at the PatternTokenizerFactory.
>
>
>
> -Hoss
>
>
Re: Indexing a word in url
Posted by Chris Hostetter <ho...@fucit.org>.
: Actually I want to use anything that is not alphabet or digit to be the
: separator - anything between them will be a word (so that I can use the URL
: fragment to see what is indexed about this site)...any suggestion?
In addition to Mike's suggestion of trying out the WordDelimiterFilter,
take a look at the PatternTokenizerFactory.
-Hoss
Re: Indexing a word in url
Posted by Vinci <vi...@polyu.edu.hk>.
Hi,
Thank you for your reply.
Actually I want to use anything that is not alphabet or digit to be the
separator - anything between them will be a word (so that I can use the URL
fragment to see what is indexed about this site)...any suggestion?
Thank you,
Vinci
Mike Klaas wrote:
>
>
> On 31-Mar-08, at 10:50 AM, Vinci wrote:
>>
>> Hi all,
>>
>> I would like to ask, if I want to index word in a URL, which data
>> type and
>> parser should I use?
>
> Depends on how you want to search it. I use WordDelimiterFilter with
> parts generation on only (no catenation), and an additiona stopwords
> like that excludes a few tokens like 'http'.
>
> -Mike
>
>
--
View this message in context: http://www.nabble.com/Indexing-a-word-in-url-tp16397739p16411091.html
Sent from the Solr - User mailing list archive at Nabble.com.