You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Miro Max <ki...@yahoo.de> on 2004/10/17 05:23:10 UTC

StopWord elimination pls. HELP

Hello,

i've got a problem with stopword elimination function.
i'm trying to use this function:

GermanAnalyzer germanAnalyzer = new GermanAnalyzer();
IndexWriter writer = new IndexWriter("dbind",
germanAnalyzer, true);

String cont = rs.getString("x");
d.add(Field.Text("cont", cont));
writer.addDocument(d);

to get results from a database into lucene index. but
when i check println(d) i can see the german stopwords
too. how can i eliminate this?

thx in advice

miro 


	

	
		
___________________________________________________________
Gesendet von Yahoo! Mail - Jetzt mit 100MB Speicher kostenlos - Hier anmelden: http://mail.yahoo.de

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Re: StopWord elimination pls. HELP

Posted by Daniel Naber <da...@t-online.de>.
On Sunday 17 October 2004 05:23, Miro Max wrote:

> d.add(Field.Text("cont", cont));
> writer.addDocument(d);
>
> to get results from a database into lucene index. but
> when i check println(d) i can see the german stopwords
> too. how can i eliminate this?

Field.Text("field", cont) where cont is a String will also store the 
original text, additionally to indexing it. toString() will then show the 
stored text. In the index you won't have any stopwords.

Regards
 Daniel

-- 
http://www.danielnaber.de

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Re: StopWord elimination pls. HELP

Posted by Miro Max <ki...@yahoo.de>.
thans for your help

 --- Morus Walter <mo...@tanto.de> schrieb: 
> Miro Max writes:
> 
> > String cont = rs.getString("x");
> > d.add(Field.Text("cont", cont));
> > writer.addDocument(d);
> > 
> > to get results from a database into lucene index.
> but
> > when i check println(d) i can see the german
> stopwords
> > too. how can i eliminate this?
> > 
> Stopwords in an analyzer don't make the stopwords
> disappear from the document,
> they only prevent them from beeing indexed.
> So you will allways see stopwords in the document
> (before indexing and,
> if the field is stored, when the document is
> retrieved from the index).
> 
> A meaningful check, if stopwords are recognized,
> would be to search for
> a stopword. You shouldn't find anything...
> 
> HTH
> 	Morus
> 
>
---------------------------------------------------------------------
> To unsubscribe, e-mail:
> lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail:
> lucene-user-help@jakarta.apache.org
> 
>  


	

	
		
___________________________________________________________
Gesendet von Yahoo! Mail - Jetzt mit 100MB Speicher kostenlos - Hier anmelden: http://mail.yahoo.de

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Re: StopWord elimination pls. HELP

Posted by Miro Max <ki...@yahoo.de>.
thans for your help

 --- Morus Walter <mo...@tanto.de> schrieb: 
> Miro Max writes:
> 
> > String cont = rs.getString("x");
> > d.add(Field.Text("cont", cont));
> > writer.addDocument(d);
> > 
> > to get results from a database into lucene index.
> but
> > when i check println(d) i can see the german
> stopwords
> > too. how can i eliminate this?
> > 
> Stopwords in an analyzer don't make the stopwords
> disappear from the document,
> they only prevent them from beeing indexed.
> So you will allways see stopwords in the document
> (before indexing and,
> if the field is stored, when the document is
> retrieved from the index).
> 
> A meaningful check, if stopwords are recognized,
> would be to search for
> a stopword. You shouldn't find anything...
> 
> HTH
> 	Morus
> 
>
---------------------------------------------------------------------
> To unsubscribe, e-mail:
> lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail:
> lucene-user-help@jakarta.apache.org
> 
>  


	

	
		
___________________________________________________________
Gesendet von Yahoo! Mail - Jetzt mit 100MB Speicher kostenlos - Hier anmelden: http://mail.yahoo.de

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Re: StopWord elimination pls. HELP

Posted by Morus Walter <mo...@tanto.de>.
Miro Max writes:

> String cont = rs.getString("x");
> d.add(Field.Text("cont", cont));
> writer.addDocument(d);
> 
> to get results from a database into lucene index. but
> when i check println(d) i can see the german stopwords
> too. how can i eliminate this?
> 
Stopwords in an analyzer don't make the stopwords disappear from the document,
they only prevent them from beeing indexed.
So you will allways see stopwords in the document (before indexing and,
if the field is stored, when the document is retrieved from the index).

A meaningful check, if stopwords are recognized, would be to search for
a stopword. You shouldn't find anything...

HTH
	Morus

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org