You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@lucenenet.apache.org by "T. R. Halvorson" <tr...@midrivers.com> on 2009/12/25 22:36:17 UTC

No Stopwords

To initialize the StandardAnalyzer to use no stop words, the following code 
seems to work. I question, however, whether it is the best way to do it. Is 
there a better way?

One wonders whether having a string array with one element whose value is 
set to the empty string causes the analyzer to repeatedly check the array 
and examine its one element. Besides the risk of a performance loss, the 
approach in the code below seems like a kludge.

C#
------------------------
// about to initialize StandardAnalyzer to use no stopwords
// apparently Lucene uses a hash table with keys internally for stoplist
// need to prevent Key from being Null
//
// set object reference to string array of stopwords to an instance
string[] saStopWords = new string[0];
// keep Key from being Null
saStopWords(0) = string.Empty;
// instantiate StandardAnalyzer to use no stopwords
StandardAnalyzer lucAnalyzer = new StandardAnalyzer(saStopWords);

Visual Basic
------------------------
' about to initialize StandardAnalyzer to use no stopwords
' apparently Lucene uses a hash table with keys internally for stoplist
' need to prevent Key from being Null
'
' set object reference to string array of stopwords to an instance
Dim saStopWords(0) As String
' keep Key from being Null
saStopWords(0) = String.Empty
' instantiate StandardAnalyzer to use no stopwords
Dim lucAnalyzer As New StandardAnalyzer(saStopWords)

I figure that using the first overload with no parameter specifying 
stopwords instantiates the analyzer to use the default stoplist for that 
analyzer. If that's so, then that overload is out. It won't let me analyze 
using no stopwords.

The overload whose single parameter is a string array works, but only if the 
string array reference is set to an instance, hence:

     string[] saStopWords = new string[0];

     or

     Dim saStopWords() as String

and only if Key is not Null, hence:

     saStopWords(0) = string.Empty;

     or

     saStopWords(0) = String.Empty

So there ya go, with the analyzer probably repeatedly examining the array 
and its single element.

I wonder whether the overload that takes a hashtable as a parameter can be 
used in some way that prevents repeated, useless examination of the table.

Any ideas?

Thanks for any help.

T. R.
trh@midrivers.com
http://www.linkedin.com/in/trhalvorson
www.ncodian.com
http://twitter.com/trhalvorson 


RE: No Stopwords

Posted by Digy <di...@gmail.com>.
I don't think that there will be a measurable performance difference in
using HashTable or string[].

Just to remind, StandardAnalyzer constructors that accept string[] for
stopwords are deprecated and will be removed with 3.0.

DIGY

-----Original Message-----
From: T. R. Halvorson [mailto:trh@midrivers.com] 
Sent: Friday, December 25, 2009 11:36 PM
To: lucene-net-user@lucene.apache.org
Subject: No Stopwords

To initialize the StandardAnalyzer to use no stop words, the following code 
seems to work. I question, however, whether it is the best way to do it. Is 
there a better way?

One wonders whether having a string array with one element whose value is 
set to the empty string causes the analyzer to repeatedly check the array 
and examine its one element. Besides the risk of a performance loss, the 
approach in the code below seems like a kludge.

C#
------------------------
// about to initialize StandardAnalyzer to use no stopwords
// apparently Lucene uses a hash table with keys internally for stoplist
// need to prevent Key from being Null
//
// set object reference to string array of stopwords to an instance
string[] saStopWords = new string[0];
// keep Key from being Null
saStopWords(0) = string.Empty;
// instantiate StandardAnalyzer to use no stopwords
StandardAnalyzer lucAnalyzer = new StandardAnalyzer(saStopWords);

Visual Basic
------------------------
' about to initialize StandardAnalyzer to use no stopwords
' apparently Lucene uses a hash table with keys internally for stoplist
' need to prevent Key from being Null
'
' set object reference to string array of stopwords to an instance
Dim saStopWords(0) As String
' keep Key from being Null
saStopWords(0) = String.Empty
' instantiate StandardAnalyzer to use no stopwords
Dim lucAnalyzer As New StandardAnalyzer(saStopWords)

I figure that using the first overload with no parameter specifying 
stopwords instantiates the analyzer to use the default stoplist for that 
analyzer. If that's so, then that overload is out. It won't let me analyze 
using no stopwords.

The overload whose single parameter is a string array works, but only if the

string array reference is set to an instance, hence:

     string[] saStopWords = new string[0];

     or

     Dim saStopWords() as String

and only if Key is not Null, hence:

     saStopWords(0) = string.Empty;

     or

     saStopWords(0) = String.Empty

So there ya go, with the analyzer probably repeatedly examining the array 
and its single element.

I wonder whether the overload that takes a hashtable as a parameter can be 
used in some way that prevents repeated, useless examination of the table.

Any ideas?

Thanks for any help.

T. R.
trh@midrivers.com
http://www.linkedin.com/in/trhalvorson
www.ncodian.com
http://twitter.com/trhalvorson