You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Mike Klaas <mi...@gmail.com> on 2008/04/01 00:39:53 UTC

Re: Indexing a word in url

On 31-Mar-08, at 10:50 AM, Vinci wrote:
>
> Hi all,
>
> I would like to ask, if I want to index word in a URL, which data  
> type and
> parser should I use?

Depends on how you want to search it.  I use WordDelimiterFilter with  
parts generation on only (no catenation), and an additiona stopwords  
like that excludes a few tokens like 'http'.

-Mike

Re: Indexing a word in url

Posted by Simon Rosenthal <si...@yahoo.com>.
I also couldn't  get the exact results I wanted for indexing URL components
using WordDelimeterFilter or patternTokenizer, so resorted to adding a new
field ('pathparts'), plus a few lines of code to  generate the tokens in our
content preprocessor which submits documents to SOLR for indexing.

-Simon

On Tue, Apr 1, 2008 at 7:24 PM, Chris Hostetter <ho...@fucit.org>
wrote:

>
> : Actually I want to use anything that is not alphabet or digit to be the
> : separator - anything between them will be a word (so that I can use the
> URL
> : fragment to see what is indexed about this site)...any suggestion?
>
> In addition to Mike's suggestion of trying out the WordDelimiterFilter,
> take a look at the PatternTokenizerFactory.
>
>
>
> -Hoss
>
>

Re: Indexing a word in url

Posted by Chris Hostetter <ho...@fucit.org>.
: Actually I want to use anything that is not alphabet or digit to be the
: separator - anything between them will be a word (so that I can use the URL
: fragment to see what is indexed about this site)...any suggestion?

In addition to Mike's suggestion of trying out the WordDelimiterFilter, 
take a look at the PatternTokenizerFactory.



-Hoss


Re: Indexing a word in url

Posted by Vinci <vi...@polyu.edu.hk>.
Hi,

Thank you for your reply.
Actually I want to use anything that is not alphabet or digit to be the
separator - anything between them will be a word (so that I can use the URL
fragment to see what is indexed about this site)...any suggestion?

Thank you,
Vinci


Mike Klaas wrote:
> 
> 
> On 31-Mar-08, at 10:50 AM, Vinci wrote:
>>
>> Hi all,
>>
>> I would like to ask, if I want to index word in a URL, which data  
>> type and
>> parser should I use?
> 
> Depends on how you want to search it.  I use WordDelimiterFilter with  
> parts generation on only (no catenation), and an additiona stopwords  
> like that excludes a few tokens like 'http'.
> 
> -Mike
> 
> 

-- 
View this message in context: http://www.nabble.com/Indexing-a-word-in-url-tp16397739p16411091.html
Sent from the Solr - User mailing list archive at Nabble.com.