You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Otis Gospodnetic <ot...@yahoo.com> on 2004/04/22 19:08:15 UTC
Re: more rigid stopword list ?
Moving to lucene-user list.
One of my Lucene articles includes a more comprehensive stop word list
for English:
http://www.onjava.com/pub/a/onjava/2003/01/15/lucene.html?page=2#references
Otis
--- hgadm@cswebmail.com wrote:
> Dear all,
>
> for my taste the stopwords included in Lucene (e.g.
> StopAnalyzer.ENGLISH_STOP_WORDS, wich is usually used
> with the SnowballAnalyzer - and I guess also with the
> StandardAnalyzer) is not strict enough:
>
> For example in a sentence with "we need ..." I would
> consider "we" and "need" as stopwords but they are not
> stripped by SnowballAnalyzer or StandardAnalyzer.
>
> Now:
> Is there an in-built solution to use more restrictive
> stripping or do I better create my own analyzer in that
> case with a more restrictive stopword list ?
>
> If so - are you aware of more rigid lists ? (a URI
> would be great !)
>
> Thanks,
>
> Holger
>
> ___________________________________________________
> The ALL NEW CS2000 from CompuServe
> Better! Faster! More Powerful!
> 250 FREE hours! Sign-on Now!
> http://www.compuserve.com/trycsrv/cs2000/webmail/
>
>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-dev-help@jakarta.apache.org
>
ka
---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org
Re: Doing a join?
Posted by jitender ahuja <aj...@aalayance.com>.
Hi,
I feel by different files u mean two different sub directories under a
common folder(i.e. directory).
Now, one stores 3 fields and another one stores 2 fields.
Also, the class_Id or Class_Name(file 1) and the Student_Id fields(file
2) should be indexed and
stored in the index.
You need to have your own code for implementing the same, that will
perform the search.
The code snippet below, I feel if included in ur own searcher code would
provide results on the
command line.
Also, I have not tested it but in case of problem, pl. send me a small
sample database of the two tables
to be indexed( and not the index files) or something similar one.
Regards,
Jitender
code:
========
String searchStr=null;
ArrayList results = new ArrayList();
ArrayList results1 = new ArrayList();
Searcher indexSrch = new IndexSearcher("file1");//"file1" is the file that
stores the 3 fields
Analyzer analyzer = new StandardAnalyzer();
Query query = QueryParser.parse(searchStr, "Class_Id", analyzer);
Hits hits = indexSrch.search(query);
for( int i=0; i<hits.length(); i++)
{
Document doc = hits.doc(i);
String id = doc.get("Student_Id");// retrieves the Student's Id
results.add(id);
}
// need to feed the "results" contents to the query1 as "searchStr1"
Searcher indexSrch1 = new IndexSearcher("file2");//"file2" is the file that
stores 2 fields
if (hits.length!=0)
{
for (int j = 0; j<results.size(); String searchStr1= results.get(j))
{
Query query1= QueryParser.parse(searchStr1, "Student_Id", analyzer);
Hits hits1 = indexSrch1.search(query1);
for( int k=0; k<hits1.length(); k++)
{
Document doc = hits1.doc(0);
String Student_name = doc.get("Student_Name");
results1.add(Student_name);
}
}
}
----- Original Message -----
From: "Rob Jose" <rj...@surewest.net>
To: "Lucene Users List" <lu...@jakarta.apache.org>
Sent: Thursday, April 22, 2004 11:56 PM
Subject: Doing a join?
> Is it possible to do a join on two fields when searching a Lucene Index.
> For example, I have an index of documents that have a "StudentName" and a
> "StudentId" field and another document that has "ClassId", "ClassName" and
> "StudentId". I want to do a search on "ClassId" or "ClassName" and get a
> list of "StudentName". Both of these documents are in one index, but are
> loaded from seperate files, so I can't join at creation time. Any help is
> greatly appreciated.
>
> Rob
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
>
>
---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org
Re: Doing a join?
Posted by jitender ahuja <aj...@aalayance.com>.
Hi,
I feel by different files u mean two different sub directories under a
common folder(i.e. directory).
Now, one stores 3 fields and another one stores 2 fields.
Also, the class_Id or Class_Name(file 1) and the Student_Id fields(file
2) should be indexed and
stored in the index.
You need to have your own code for implementing the same, that will
perform the search.
The code snippet below, I feel if included in ur own searcher code would
provide results on the
command line.
Also, I have not tested it but in case of problem, pl. send me a small
sample database of the two tables
to be indexed( and not the index files) or something similar one.
Regards,
Jitender
code:
========
String searchStr=null;
ArrayList results = new ArrayList();
ArrayList results1 = new ArrayList();
Searcher indexSrch = new IndexSearcher("file1");//"file1" is the file that
stores the 3 fields
Analyzer analyzer = new StandardAnalyzer();
Query query = QueryParser.parse(searchStr, "Class_Id", analyzer);
Hits hits = indexSrch.search(query);
for( int i=0; i<hits.length(); i++)
{
Document doc = hits.doc(i);
String id = doc.get("Student_Id");// retrieves the Student's Id
results.add(id);
}
// need to feed the "results" contents to the query1 as "searchStr1"
Searcher indexSrch1 = new IndexSearcher("file2");//"file2" is the file that
stores 2 fields
if (hits.length!=0)
{
for (int j = 0; j<results.size(); String searchStr1= results.get(j))
{
Query query1= QueryParser.parse(searchStr1, "Student_Id", analyzer);
Hits hits1 = indexSrch1.search(query1);
for( int k=0; k<hits1.length(); k++)
{
Document doc = hits1.doc(0);
String Student_name = doc.get("Student_Name");
results1.add(Student_name);
}
}
}
----- Original Message -----
From: "Rob Jose" <rj...@surewest.net>
To: "Lucene Users List" <lu...@jakarta.apache.org>
Sent: Thursday, April 22, 2004 11:56 PM
Subject: Doing a join?
> Is it possible to do a join on two fields when searching a Lucene Index.
> For example, I have an index of documents that have a "StudentName" and a
> "StudentId" field and another document that has "ClassId", "ClassName" and
> "StudentId". I want to do a search on "ClassId" or "ClassName" and get a
> list of "StudentName". Both of these documents are in one index, but are
> loaded from seperate files, so I can't join at creation time. Any help is
> greatly appreciated.
>
> Rob
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
>
>
---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org
Doing a join?
Posted by Rob Jose <rj...@surewest.net>.
Is it possible to do a join on two fields when searching a Lucene Index.
For example, I have an index of documents that have a "StudentName" and a
"StudentId" field and another document that has "ClassId", "ClassName" and
"StudentId". I want to do a search on "ClassId" or "ClassName" and get a
list of "StudentName". Both of these documents are in one index, but are
loaded from seperate files, so I can't join at creation time. Any help is
greatly appreciated.
Rob
---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org
Re: more rigid stopword list ?
Posted by Erik Hatcher <er...@ehatchersolutions.com>.
p.s. there is no need to create a new Analyzer to tweak the stop word
list. The analyzers that do stop word removal accept the list as an
argument to an overloaded constructor.
Erik
On Apr 22, 2004, at 1:08 PM, Otis Gospodnetic wrote:
> Moving to lucene-user list.
>
> One of my Lucene articles includes a more comprehensive stop word list
> for English:
>
> http://www.onjava.com/pub/a/onjava/2003/01/15/lucene.html?
> page=2#references
>
> Otis
>
> --- hgadm@cswebmail.com wrote:
>> Dear all,
>>
>> for my taste the stopwords included in Lucene (e.g.
>> StopAnalyzer.ENGLISH_STOP_WORDS, wich is usually used
>> with the SnowballAnalyzer - and I guess also with the
>> StandardAnalyzer) is not strict enough:
>>
>> For example in a sentence with "we need ..." I would
>> consider "we" and "need" as stopwords but they are not
>> stripped by SnowballAnalyzer or StandardAnalyzer.
>>
>> Now:
>> Is there an in-built solution to use more restrictive
>> stripping or do I better create my own analyzer in that
>> case with a more restrictive stopword list ?
>>
>> If so - are you aware of more rigid lists ? (a URI
>> would be great !)
>>
>> Thanks,
>>
>> Holger
>>
>> ___________________________________________________
>> The ALL NEW CS2000 from CompuServe
>> Better! Faster! More Powerful!
>> 250 FREE hours! Sign-on Now!
>> http://www.compuserve.com/trycsrv/cs2000/webmail/
>>
>>
>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
>> For additional commands, e-mail: lucene-dev-help@jakarta.apache.org
>>
> ka
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org
Re: RemoteSearchable
Posted by Erik Hatcher <er...@ehatchersolutions.com>.
Ah, yet another perfect time to plug Lucene's test suite. There is an
example of it in Lucene's own test code. This is always your best bet
for prime (perhaps contrived, but effective for learning the API)
examples.
Erik
On Apr 30, 2004, at 8:14 PM, Venu Durgam wrote:
>
> I was wondering if you implemented search using RemoteSearchable.
> Would you kindly email sample source code.
>
> Thanks.
> Venu Durgam
>
---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org
RemoteSearchable
Posted by Venu Durgam <vd...@yahoo.com>.
I was wondering if you implemented search using RemoteSearchable.
Would you kindly email sample source code.
Thanks.
Venu Durgam