You are viewing a plain text version of this content. The canonical link for it is here.
Posted to pylucene-dev@lucene.apache.org by technology inspired <te...@gmail.com> on 2010/09/03 11:07:48 UTC
Stopwords in StandardAnalyzer; Constructor
Hi,
How can one define the list of allowed stopwords in StandardAnalyzer?
According to Lucene Java API doc, a set should be defined in Constructor to
include the list of allowed Stopwords. I want to avoid skipping few words
like "The", "on", "off" from being not indexed while using StandardAnalyzer.
How one would define such a constructor in PyLucene?
Regards,
Vin
Re: Stopwords in StandardAnalyzer; Constructor
Posted by Andi Vajda <va...@apache.org>.
On Fri, 3 Sep 2010, technology inspired wrote:
> How can one define the list of allowed stopwords in StandardAnalyzer?
> According to Lucene Java API doc, a set should be defined in Constructor to
> include the list of allowed Stopwords. I want to avoid skipping few words
> like "The", "on", "off" from being not indexed while using StandardAnalyzer.
>
> How one would define such a constructor in PyLucene?
Stop words can be passed to StandardAnalyzer via a Set instance.
To do this you can either:
- add java.util.HashSet to PyLucene's jcc invocation in Makefile,
rebuild PyLucene and then use a HashSet instance (in the Makefile, look
for java.util.Arrays and add java.util.HashSet below).
- use the JavaSet class in the collections.py module that is installed
with PyLucene. The JavaSet class is a Python class that extends
PythonSet, a Java class that implements the java.util.Set interface.
JavaSet takes a set instance, wraps it and makes its elements
accessible to Java via the java.util.Set interface.
For example:
>>> from lucene import *
>>> from lucene.collections import JavaSet
>>> initVM()
<jcc.JCCEnv object at 0x10040a0d8>
>>> a=set(['foo', 'bar', 'baz'])
>>> b=JavaSet(a)
>>> b
<JavaSet: org.apache.pylucene.util.PythonSet@424ecfdd>
>>> StandardAnalyzer(Version.LUCENE_CURRENT, b)
<StandardAnalyzer: org.apache.lucene.analysis.standard.StandardAnalyzer@4430d82d>
Andi..