You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by masz-wow <ma...@gmail.com> on 2007/08/01 06:24:28 UTC

Problem Search using lucene

I understand that only document that has been indexed will be able to search.
I already manage to index the document and also search the content of the
document.
The problem is, why is that there are a few words that cannot be search?
E.g : A document contains this sentence 
"So on the next Monday, when Big taufiq John once again got on the bus and
said, keratong >"Big John doesn't pay!" The driver stood up, glared back at
the passenger, >and screamed, "And why not?" With a surprised look on his
face, Big John >replied, "Big John has a bus pass." Managementq "

I can search all the contents of this document BUT when I key in the word
'on' or 'and' the document cannot be searched anymore.

>From my understanding, once the document being indexed we will be able to
search all the contents of the document
-- 
View this message in context: http://www.nabble.com/Problem-Search-using-lucene-tf4197963.html#a11939477
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Problem Search using lucene

Posted by Michael Wechner <mi...@wyona.com>.
Chhabra, Kapil wrote:

>You just have to make sure that what you are searching is indexed (and
>esp. in the same format/case).
>Use Luke (http://www.getopt.org/luke/) to browse through your index.
>  
>

Does Luke also work re to Nutch?

Thanks

Michael

>This might give you an insight of what you have indexed and what you are
>searching for.
>
>Regards,
>kapilChhabra
>
>-----Original Message-----
>From: masz-wow [mailto:maznorlia.syafina@gmail.com] 
>Sent: Wednesday, August 01, 2007 12:13 PM
>To: java-user@lucene.apache.org
>Subject: Re: Problem Search using lucene
>
>
>Thanks Joe
>
>I'm using this function as my analyzer
>
>public static Analyzer getDefaultAnalyzer() {
>	PerFieldAnalyzerWrapper perFieldAnalyzer = new
>PerFieldAnalyzerWrapper(new
>StopAnalyzer());
>		perFieldAnalyzer.addAnalyzer("contents", new
>StopAnalyzer());
>		perFieldAnalyzer.addAnalyzer("fileID", new
>WhitespaceAnalyzer());
>		perFieldAnalyzer.addAnalyzer("path", new
>KeywordAnalyzer());
>		return perFieldAnalyzer;
>	}
>
>StopAnalyzer builds an analyzer which removes words in
>ENGLISH_STOP_WORDS.That might be the cause why I cannot search words
>such as
>'and' 'to'
>
>BUT
>
>I'm still having problem when I search a few words other than english
>words
>such as name (eg: Ghazat) or string of numbers (eg:45600).
>  
>


-- 
Michael Wechner
Wyona      -   Open Source Content Management - Yanel, Yulup
http://www.wyona.com
michael.wechner@wyona.com, michi@apache.org
+41 44 272 91 61


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


RE: Problem Search using lucene

Posted by "Chhabra, Kapil" <kc...@akamai.com>.
You just have to make sure that what you are searching is indexed (and
esp. in the same format/case).
Use Luke (http://www.getopt.org/luke/) to browse through your index.
This might give you an insight of what you have indexed and what you are
searching for.

Regards,
kapilChhabra

-----Original Message-----
From: masz-wow [mailto:maznorlia.syafina@gmail.com] 
Sent: Wednesday, August 01, 2007 12:13 PM
To: java-user@lucene.apache.org
Subject: Re: Problem Search using lucene


Thanks Joe

I'm using this function as my analyzer

public static Analyzer getDefaultAnalyzer() {
	PerFieldAnalyzerWrapper perFieldAnalyzer = new
PerFieldAnalyzerWrapper(new
StopAnalyzer());
		perFieldAnalyzer.addAnalyzer("contents", new
StopAnalyzer());
		perFieldAnalyzer.addAnalyzer("fileID", new
WhitespaceAnalyzer());
		perFieldAnalyzer.addAnalyzer("path", new
KeywordAnalyzer());
		return perFieldAnalyzer;
	}

StopAnalyzer builds an analyzer which removes words in
ENGLISH_STOP_WORDS.That might be the cause why I cannot search words
such as
'and' 'to'

BUT

I'm still having problem when I search a few words other than english
words
such as name (eg: Ghazat) or string of numbers (eg:45600).
-- 
View this message in context:
http://www.nabble.com/Problem-Search-using-lucene-tf4197963.html#a119405
36
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Problem Search using lucene

Posted by masz-wow <ma...@gmail.com>.
Thanks Joe

I'm using this function as my analyzer

public static Analyzer getDefaultAnalyzer() {
	PerFieldAnalyzerWrapper perFieldAnalyzer = new PerFieldAnalyzerWrapper(new
StopAnalyzer());
		perFieldAnalyzer.addAnalyzer("contents", new StopAnalyzer());
		perFieldAnalyzer.addAnalyzer("fileID", new WhitespaceAnalyzer());
		perFieldAnalyzer.addAnalyzer("path", new KeywordAnalyzer());
		return perFieldAnalyzer;
	}

StopAnalyzer builds an analyzer which removes words in
ENGLISH_STOP_WORDS.That might be the cause why I cannot search words such as
'and' 'to'

BUT

I'm still having problem when I search a few words other than english words
such as name (eg: Ghazat) or string of numbers (eg:45600).
-- 
View this message in context: http://www.nabble.com/Problem-Search-using-lucene-tf4197963.html#a11940536
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Problem Search using lucene

Posted by Joe Attardi <ja...@gmail.com>.
You are probably using the StandardAnalyzer which removes stop words such as
"and".

-- 
Joe Attardi
jattardi@gmail.com
http://thinksincode.blogspot.com/

On 8/1/07, masz-wow <ma...@gmail.com> wrote:
>
>
> I understand that only document that has been indexed will be able to
> search.
> I already manage to index the document and also search the content of the
> document.
> The problem is, why is that there are a few words that cannot be search?
> E.g : A document contains this sentence
> "So on the next Monday, when Big taufiq John once again got on the bus and
> said, keratong >"Big John doesn't pay!" The driver stood up, glared back
> at
> the passenger, >and screamed, "And why not?" With a surprised look on his
> face, Big John >replied, "Big John has a bus pass." Managementq "
>
> I can search all the contents of this document BUT when I key in the word
> 'on' or 'and' the document cannot be searched anymore.
>
> From my understanding, once the document being indexed we will be able to
> search all the contents of the document
> --
> View this message in context:
> http://www.nabble.com/Problem-Search-using-lucene-tf4197963.html#a11939477
> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>