You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@lucenenet.apache.org by Justin Furniss <pr...@gmail.com> on 2007/11/08 04:37:22 UTC

Indexing weird words

Hi all

I'll start off by saying I am new to Lucene and my real problem may be not
knowing how to ask this question...

I have an index that I am creating over a long period of time, on some days
adding thousands of results to the index one by one.  I am running into a
situation where a user is attempting to search for the term 'diabecell' and
is getting nothing back.  I know that there are several indexed pages that
contain this term but they are not returned as results.  So my question...
How does Lucene know what 'words' to index in a page and how can I specify
for it to index more 'words'?  Is there a way to tell it to run back through
the indexed items and look for a certian word to add to the index.

Any help is appreciated!!!!

Justin

-- 
Ford.. There's an infinite number of monkeys outside that want to show us
the script for Hamlet they've worked out

RE: Indexing weird words

Posted by DIGY <di...@gmail.com>.
Hi,

First, you can take a look at
http://today.java.net/pub/a/today/2003/07/30/LuceneIntro.html

> How does Lucene know what 'words' to index in a page 
Lucene's analyzers (which, in turn, use tokenizers) split the sequence of
chars(text to be indexed) into meaningful parts(words).

>how can I specify for it to index more 'words'?  
Unless you write your own analyzer, you are restricted with the logic of the
analyzer you are using.(Lucene's Analyzers are enough for the most of the
cases)

> I am running into a situation where a user is attempting to search for the
term 'diabecell' and is getting nothing back
There is no difference between the words 'diabecell','dialog' or 'xyzabcqqq'
in Lucene, since they are tokenized using syntatic rules(not semantic).

> Is there a way to tell it to run back through the indexed items and look
for a certian word to add to the index
You have to delete and add (update) documents again.

DIGY

-----Original Message-----
From: Justin Furniss [mailto:primeobsession@gmail.com] 
Sent: Thursday, November 08, 2007 5:37 AM
To: lucene-net-user@incubator.apache.org
Subject: Indexing weird words

Hi all

I'll start off by saying I am new to Lucene and my real problem may be not
knowing how to ask this question...

I have an index that I am creating over a long period of time, on some days
adding thousands of results to the index one by one.  I am running into a
situation where a user is attempting to search for the term 'diabecell' and
is getting nothing back.  I know that there are several indexed pages that
contain this term but they are not returned as results.  So my question...
How does Lucene know what 'words' to index in a page and how can I specify
for it to index more 'words'?  Is there a way to tell it to run back through
the indexed items and look for a certian word to add to the index.

Any help is appreciated!!!!

Justin

-- 
Ford.. There's an infinite number of monkeys outside that want to show us
the script for Hamlet they've worked out