You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Miro Max <ki...@yahoo.de> on 2004/10/17 05:23:10 UTC
StopWord elimination pls. HELP
Hello,
i've got a problem with stopword elimination function.
i'm trying to use this function:
GermanAnalyzer germanAnalyzer = new GermanAnalyzer();
IndexWriter writer = new IndexWriter("dbind",
germanAnalyzer, true);
String cont = rs.getString("x");
d.add(Field.Text("cont", cont));
writer.addDocument(d);
to get results from a database into lucene index. but
when i check println(d) i can see the german stopwords
too. how can i eliminate this?
thx in advice
miro
___________________________________________________________
Gesendet von Yahoo! Mail - Jetzt mit 100MB Speicher kostenlos - Hier anmelden: http://mail.yahoo.de
---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org
Re: StopWord elimination pls. HELP
Posted by Daniel Naber <da...@t-online.de>.
On Sunday 17 October 2004 05:23, Miro Max wrote:
> d.add(Field.Text("cont", cont));
> writer.addDocument(d);
>
> to get results from a database into lucene index. but
> when i check println(d) i can see the german stopwords
> too. how can i eliminate this?
Field.Text("field", cont) where cont is a String will also store the
original text, additionally to indexing it. toString() will then show the
stored text. In the index you won't have any stopwords.
Regards
Daniel
--
http://www.danielnaber.de
---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org
Re: StopWord elimination pls. HELP
Posted by Miro Max <ki...@yahoo.de>.
thans for your help
--- Morus Walter <mo...@tanto.de> schrieb:
> Miro Max writes:
>
> > String cont = rs.getString("x");
> > d.add(Field.Text("cont", cont));
> > writer.addDocument(d);
> >
> > to get results from a database into lucene index.
> but
> > when i check println(d) i can see the german
> stopwords
> > too. how can i eliminate this?
> >
> Stopwords in an analyzer don't make the stopwords
> disappear from the document,
> they only prevent them from beeing indexed.
> So you will allways see stopwords in the document
> (before indexing and,
> if the field is stored, when the document is
> retrieved from the index).
>
> A meaningful check, if stopwords are recognized,
> would be to search for
> a stopword. You shouldn't find anything...
>
> HTH
> Morus
>
>
---------------------------------------------------------------------
> To unsubscribe, e-mail:
> lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail:
> lucene-user-help@jakarta.apache.org
>
>
___________________________________________________________
Gesendet von Yahoo! Mail - Jetzt mit 100MB Speicher kostenlos - Hier anmelden: http://mail.yahoo.de
---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org
Re: StopWord elimination pls. HELP
Posted by Miro Max <ki...@yahoo.de>.
thans for your help
--- Morus Walter <mo...@tanto.de> schrieb:
> Miro Max writes:
>
> > String cont = rs.getString("x");
> > d.add(Field.Text("cont", cont));
> > writer.addDocument(d);
> >
> > to get results from a database into lucene index.
> but
> > when i check println(d) i can see the german
> stopwords
> > too. how can i eliminate this?
> >
> Stopwords in an analyzer don't make the stopwords
> disappear from the document,
> they only prevent them from beeing indexed.
> So you will allways see stopwords in the document
> (before indexing and,
> if the field is stored, when the document is
> retrieved from the index).
>
> A meaningful check, if stopwords are recognized,
> would be to search for
> a stopword. You shouldn't find anything...
>
> HTH
> Morus
>
>
---------------------------------------------------------------------
> To unsubscribe, e-mail:
> lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail:
> lucene-user-help@jakarta.apache.org
>
>
___________________________________________________________
Gesendet von Yahoo! Mail - Jetzt mit 100MB Speicher kostenlos - Hier anmelden: http://mail.yahoo.de
---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org
Re: StopWord elimination pls. HELP
Posted by Morus Walter <mo...@tanto.de>.
Miro Max writes:
> String cont = rs.getString("x");
> d.add(Field.Text("cont", cont));
> writer.addDocument(d);
>
> to get results from a database into lucene index. but
> when i check println(d) i can see the german stopwords
> too. how can i eliminate this?
>
Stopwords in an analyzer don't make the stopwords disappear from the document,
they only prevent them from beeing indexed.
So you will allways see stopwords in the document (before indexing and,
if the field is stored, when the document is retrieved from the index).
A meaningful check, if stopwords are recognized, would be to search for
a stopword. You shouldn't find anything...
HTH
Morus
---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org